Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Student's t-test wikipedia , lookup
DMBA: Statistics Lecture 2: Simple Linear Regression Least Squares, SLR properties, Inference, and Forecasting Carlos Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching 1 Today’s Plan 1. The Least Squares Criteria 2. The Simple Linear Regression Model 3. Estimation for the SLR Model I sampling distributions I confidence intervals I hypothesis testing 2 Linear Prediction Ŷi = b0 + b1 Xi I b0 is the intercept and b1 is the slope I We find b0 and b1 using Least Squares 3 The Least Squares Criterion The formulas for b0 and b1 that minimize the least squares criterion are: b1 = corr (X , Y ) × sY sX b0 = Ȳ − b1 X̄ where, v u n uX 2 Yi − Ȳ sY = t i=1 v u n uX 2 and sX = t Xi − X̄ i=1 4 Review: Covariance and Correlation Correlation and Covariance Measure the direction and strength of the linear relationship 56$23"$()4".1 , -"./01"$23"$direction .4*$strength between variables Y and X 1"(.2)54/3)7$8"29""4$23"$:.1).8("/$;$.4*$< The direction i the sign of the c Applied Regression n Analysis i Carlos M. Carvalhoi=1 P Cov (Y , X ) = (Y − Ȳ )(Xi − X̄ ) n−1 5 Correlation and Covariance Correlation is the standardized covariance: corr(X , Y ) = p cov(X , Y ) var(X )var(Y ) = cov(X , Y ) sd(X )sd(Y ) The correlation is scale invariant and the units of measurement don’t matter: It is always true that −1 ≤ corr(X , Y ) ≤ 1. This gives the direction (- or +) and strength (0 → 1) of the linear relationship between X and Y . 6 3 3 Correlation corr = .5 2 1 0 -1 -2 -3 -3 -2 -1 0 1 2 corr = 1 -1 0 1 2 3 -3 -2 -1 0 1 2 3 3 -2 3 -3 1 0 -1 -2 -3 -3 -2 -1 0 1 2 corr = -.8 2 corr = .8 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 7 Correlation Only measures linear relationships: corr(X , Y ) = 0 does not mean the variables are not related! corr = 0.72 -8 0 -6 5 -4 10 -2 15 0 20 corr = 0.01 -3 -2 -1 0 1 2 0 5 10 15 20 Also be careful with influential observations. 8 Back to Least Squares 1. Intercept: b0 = Ȳ − b1 X̄ ⇒ Ȳ = b0 + b1 X̄ I The point (X̄ , Ȳ ) is on the regression line! I Least squares finds the point of means and rotate the line through that point until getting the “right” slope 2. Slope: b1 = corr (X , Y ) × I sY sX So, the right slope is the correlation coefficient times a scaling factor that ensures the proper units for b1 9 More on Least Squares From now on, terms “fitted values” (Ŷi ) and “residuals” (ei ) refer to those obtained from the least squares line. The fitted values and residuals have some special properties. Lets look at the housing data analysis to figure out what these properties are... 10 120 100 corr(y.hat, x) = 1 80 Fitted Values 140 160 The Fitted Values and X 1.0 1.5 2.0 2.5 3.0 3.5 X 11 30 The Residuals and X -10 0 10 mean(e) = 0 -20 Residuals 20 corr(e, x) = 0 1.0 1.5 2.0 2.5 3.0 3.5 X 12 Why? What is the intuition for the relationship between Ŷ and e and X ? 100 120 Crazy line: 10 + 50 X 80 LS line: 38.9 + 35.4 X 60 Y 140 160 Lets consider some “crazy”alternative line: 1.0 1.5 2.0 2.5 3.0 3.5 X 13 Fitted Values and Residuals This is a bad fit! We are underestimating the value of small houses 30 and overestimating the value of big houses. -10 0 10 mean(e) = 1.8 -20 Crazy Residuals 20 corr(e, x) = -0.7 1.0 1.5 2.0 2.5 3.0 3.5 X Clearly, we have left some predictive ability on the table! 14 Fitted Values and Residuals As long as the correlation between e and X is non-zero, we could always adjust our prediction rule to do better. We need to exploit all of the predictive power in the X values and put this into Ŷ , leaving no “Xness” in the residuals. In Summary: Y = Ŷ + e where: I Ŷ is “made from X ”; corr(X , Ŷ ) = 1. I e is unrelated to X ; corr(X , e) = 0. 15 Another way to derive things The intercept: n n 1X 1X ei = 0 ⇒ (Yi − b0 − b1 Xi ) n i=1 n i=1 ⇒ Ȳ − b0 − b1 X̄ = 0 ⇒ b0 = Ȳ − b1 X̄ 16 Another way to derive things The slope: corr(e, X ) = n X ei (Xi − X̄ ) = 0 i=1 = n X (Yi − b0 − b1 Xi )(Xi − X̄ ) i=1 n X = (Yi − Ȳ − b1 (Xi − X̄ ))(Xi − X̄ ) i=1 Pn ⇒ b1 = i=1 (Xi − X̄ )(Yi − Pn 2 i=1 (Xi − X̄ ) Ȳ ) = rxy sy sx 17 Decomposing the Variance How well does the least squares line explain variation in Y ? Since Ŷ and e are independent (i.e. cov(Ŷ , e) = 0), var(Y ) = var(Ŷ + e) = var(Ŷ ) + var(e) This leads to n X i=1 (Yi − Ȳ )2 = n n X X (Ŷi − Ȳ )2 + ei2 i=1 i=1 18 Decomposing the Variance – ANOVA Tables SSR: Variation in Y explained by the regression line. SSE: Variation in Y that is left unexplained. SSR = SST ⇒ perfect fit. Be careful of similar acronyms; e.g. SSR for “residual” SS. 19 Decomposing Variance – The ANOVATables Table Decomposingthethe Variance – ANOVA Applied Regression Analysis – Fall 2008 20 A Goodness of Fit Measure: R 2 The coefficient of determination, denoted by R 2 , measures goodness of fit: R2 = SSR SSE =1− SST SST I 0 < R 2 < 1. I The closer R 2 is to 1, the better the fit. 21 A Goodness of Fit Measure: R 2 2 ( i.e., R 2 is squared correlation). An interesting fact: R 2 = rxy R 2 = = = Pn (Ŷi − Ȳ )2 Pi=1 n (Yi − Ȳ )2 Pni=1 2 i=1 (b0 + b1 Xi − b0 − b1 X̄ ) Pn 2 i=1 (Yi − Ȳ ) P b12 ni=1 (Xi − X̄ )2 b12 sx2 2 Pn = = rxy 2 2 s (Y − Ȳ ) i y i=1 No surprise: the higher the sample correlation between X and Y , the better you are doing in your regression. 22 BackBack to the House to the HouseData Data ''- ''/ Applied Regression Analysis Carlos M. Carvalho ''. !""#$%%&$'()*"$+, 23 Prediction and the Modelling Goal A prediction rule is any function where you input X and it outputs Ŷ as a predicted response at X . The least squares line is a prediction rule: Ŷ = f (X ) = b0 + b1 X 24 Prediction and the Modelling Goal Ŷ is not going to be a perfect prediction. We need to devise a notion of forecast accuracy. 25 Prediction and the Modelling Goal There are two things that we want to know: I What value of Y can we expect for a given X? I How sure are we about this forecast? Or how different could Y be from what we expect? Our goal is to measure the accuracy of our forecasts or how much uncertainty there is in the forecast. One method is to specify a range of Y values that are likely, given an X value. Prediction Interval: probable range for Y-values given X 26 Prediction and the Modelling Goal Key Insight: To construct a prediction interval, we will have to assess the likely range of residual values corresponding to a Y value that has not yet been observed! We will build a probability model (e.g., normal distribution). Then we can say something like “with 95% probability the residuals will be no less than -$28,000 or larger than $28,000”. We must also acknowledge that the “fitted” line may be fooled by particular realizations of the residuals. 27 The Simple Linear Regression Model The power of statistical inference comes from the ability to make precise statements about the accuracy of the forecasts. In order to do this we must invest in a probability model. Simple Linear Regression Model: Y = β0 + β1 X + ε ε ∼ N(0, σ 2 ) The error term ε is independent “idosyncratic noise”. 28 Independent Normal Additive Error Why do we have ε ∼ N(0, σ 2 )? I E [ε] = 0 ⇔ E [Y | X ] = β0 + β1 X (E [Y | X ] is “conditional expectation of Y given X ”). I Many things are close to Normal (central limit theorem). I MLE estimates for β’s are the same as the LS b’s. I It works! This is a very robust model for the world. We can think of β0 + β1 X as the “true” regression line. 29 The Regression Model and our House Data Think of E [Y |X ] as the average price of houses with size X: Some houses could have a higher than expected value, some lower, and the true line tells us what to expect on average. The error term represents influence of factors other X . 30 Conditional Distributions The conditional distribution for Y given X is Normal: Y |X ∼ N(β0 + β1 X , σ 2 ). σ controls dispersion: 31 Conditional vs Marginal Distributions More on the conditional distribution: Y |X ∼ N(E [Y |X ], var(Y |X )). I Mean is E [Y |X ] = E [β0 + β1 X + ε] = β0 + β1 X . I Variance is var(β0 + β1 X + ε) = var(ε) = σ 2 . Remember our sliced boxplots: I σ 2 < var(Y ) if X and Y are related. 32 Prediction Intervals with the True Model You are told (without looking at the data) that β0 = 40; β1 = 45; σ = 10 and you are asked to predict price of a 1500 square foot house. What do you know about Y from the model? Y = 40 + 45(1.5) + ε = 107.5 + ε Thus our prediction for price is Y ∼ N(107.5, 102 ) 33 Prediction Intervals with the True Model The model says that the mean value of a 1500 sq. ft. house is $107,500 and that deviation from mean is within ≈ $20,000. We are 95% sure that I −20 < ε < 20 I $87, 500 < Y < $127, 500 In general, the 95 % Prediction Interval is PI = β0 + β1 X ± 2σ. 34 Summary of Simple Linear Regression Assume that all observations are drawn from our regression model and that errors on those observations are independent. The model is Yi = β0 + β1 Xi + εi where ε is independent and identically distributed N(0, σ 2 ). The SLR has 3 basic parameters: I β0 , β1 (linear pattern) I σ (variation around the line). 35 Key Characteristics of Linear Regression Model I Mean of Y is linear in X . I Error terms (deviations from line) are normally distributed (very few deviations are more than 2 sd away from the regression mean). I Error terms have constant variance. 36 Break Back in 15 minutes... 37 Recall: Estimation for the SLR Model SLR assumes every observation in the dataset was generated by the model: Yi = β0 + β1 Xi + εi This is a model for the conditional distribution of Y given X. We use Least Squares to estimate β0 and β1 : Pn β̂1 = b1 = i=1 (Xi − X̄ )(Yi − Pn 2 i=1 (Xi − X̄ ) Ȳ ) β̂0 = b0 = Ȳ − b1 X̄ 38 Estimation for the SLR Model 39 Estimation of Error Variance iid Recall that εi ∼ N(0, σ 2 ), and that σ drives the width of the prediction intervals: σ 2 = var(εi ) = E [(εi − E [εi ])2 ] = E [ε2i ] A sensible strategy would be to estimate the average for squared errors with the sample average squared residuals: n σ̂ 2 = 1X 2 ei n i=1 40 Estimation of Error Variance However, this is not an unbiased estimator of σ 2 . We have to alter the denominator slightly: n 1 X 2 SSE s = ei = n−2 n−2 2 i=1 (2 is the number of regression coefficients; i.e. 2 for β0 + β1 ). We have n − 2 degrees of freedom because 2 have been “used up” in the estimatiation of b0 and b1 . We usually use s = p SSE /(n − 2), in the same units as Y . 41 Degrees of Freedom Degrees of Freedom is the number of times you get to observe useful information about the variance you’re trying to estimate. For example, consider SST = I Pn i=1 (Yi − Ȳ )2 : If n = 1, Ȳ = Y1 and SST = 0: since Y1 is “used up” estimating the mean, we haven’t observed any variability! I For n > 1, we’ve only had n − 1 chances for deviation from the mean, and we estimate sy2 = SST /(n − 1). In regression with p coefficients (e.g., p = 2 in SLR), you only get n − p real observations of variability ⇒ DoF = n − p. 42 Estimation of Error Variance Estimation of V2 !,"-"$).$s )/$0,"$123"($4506507 Where is s in the Excel output? . 8"9"9:"-$;,"/"<"-$=45$.""$>.0?/*?-*$"--4-@$-"?*$)0$?.$estimated Remember that whenever you see “standard error” read it as .0?/*?-*$*"<)?0)4/&$V ).$0,"$.0?/*?-*$*"<)?0)4/ estimated standard deviation: σ is the standard deviation. Applied Regression Analysis Carlos M. Carvalho !""#$%%%&$'()*"$+ 43 Sampling Distribution of Least Squares Estimates How much do our estimates depend on the particular random sample that we happen to observe? Imagine: I Randomly draw different samples of the same size. I For each sample, compute the estimates b0 , b1 , and s. If the estimates don’t vary much from sample to sample, then it doesn’t matter which sample you happen to observe. If the estimates do vary a lot, then it matters which sample you happen to observe. 44 Sampling Distribution of Least Squares Estimates 45 Sampling Distribution of Least Squares Estimates 46 Sampling Distribution of Least Squares Estimates LS lines are much closer to the true line when n = 50. For n = 5, some lines are close, others aren’t: we need to get “lucky” 47 Review: Sampling Distribution of Sample Mean Step back for a moment and consider the mean for an iid sample of n observations of a random variable {X1 , . . . , Xn } Suppose that E (Xi ) = µ and var(Xi ) = σ 2 I I E (X̄ ) = 1 n P E (Xi ) = µ P var(X̄ ) = var n1 Xi = 1 n2 P var (Xi ) = σ2 n 2 If X is normal, then X̄ ∼ N µ, σn . If X is not normal, we have the central limit theorem (more in a minute)! 48 Oracle vs SAP Example (understanding variation) 49 Oracle vs SAP 50 Oracle vs SAP Do you really believe that SAP affects ROE? How else could we look at this question? 51 Central Limit Theorem Simple CLT states that for iid random variables, X , with mean µ and variance σ 2 , the distribution of the sample mean becomes normal as the number of observations, n, gets large. σ2 ), and sample averages tend to be normally n distributed in large samples. That is, X̄ →n N(µ, 52 Central Limit Theorem 0.0 0.2 0.4 f(X) 0.6 0.8 1.0 Exponential random variables don’t look very normal: 0 1 2 3 4 5 X E [X ] = 1 and var(X ) = 1. 53 Central Limit Theorem 150 50 0 Frequency 250 1000 means from n=2 samples 0 1 2 3 4 5 apply(matrix(rexp(2 * 1000), ncol = 1000), 2, mean) 54 Central Limit Theorem 100 50 0 Frequency 150 1000 means from n=5 samples 0.0 0.5 1.0 1.5 2.0 2.5 3.0 apply(matrix(rexp(5 * 1000), ncol = 1000), 2, mean) 55 Central Limit Theorem 150 100 50 0 Frequency 200 250 1000 means from n=10 samples 0.0 0.5 1.0 1.5 2.0 apply(matrix(rexp(10 * 1000), ncol = 1000), 2, mean) 56 Central Limit Theorem 100 50 0 Frequency 150 200 1000 means from n=100 samples 0.7 0.8 0.9 1.0 1.1 1.2 1.3 apply(matrix(rexp(100 * 1000), ncol = 1000), 2, mean) 57 Central Limit Theorem 150 100 50 0 Frequency 200 1000 means from n=1k samples 0.90 0.95 1.00 1.05 1.10 apply(matrix(rexp(1000 * 1000), ncol = 1000), 2, mean) 58 Sampling Distribution of b1 The sampling distribution of b1 describes how estimator b1 = β̂1 varies over different samples with the X values fixed. It turns out that b1 is normally distributed: b1 ∼ N(β1 , σb21 ). I b1 is unbiased: E [b1 ] = β1 . I Sampling sd σb1 determines precision of b1 . 59 Sampling Distribution of b1 Can we intuit what should be in the formula for σb1 ? I How should σ figure in the formula? I What about n? I Anything else? var(b1 ) = P σ2 σ2 = (n − 1)sx2 (Xi − X̄ )2 Three Factors: sample size (n), error variance (σ 2 = σε2 ), and X -spread (sx ). 60 Sampling Distribution of b0 The intercept is also normal and unbiased: b0 ∼ N(β0 , σb20 ). σb20 = var(b0 ) = σ 2 1 X̄ 2 + n (n − 1)sx2 What is the intuition here? 61 The Importance of Understanding Variation When estimating a quantity, it is vital to develop a notion of the precision of the estimation; for example: I estimate the slope of the regression line I estimate the value of a house given its size I estimate the expected return on a portfolio I estimate the value of a brand name I estimate the damages from patent infringement Why is this important? We are making decisions based on estimates, and these may be very sensitive to the accuracy of the estimates! 62 The Importance of Understanding Variation Example from “everyday” life: I When building a house, we can estimate a required piece of wood to 1/4”? I When building a fine cabinet, the estimates may have to be accurate to 1/16” or even 1/32”. The standard deviations of the least squares estimators of the slope and intercept give a precise measurement of the accuracy of the estimator. However, these formulas aren’t especially practical since they involve the unknown quantity: σ 63 Estimated Variance We estimate variation with “sample standard deviations”: s sb1 = Recall that s = s2 (n − 1)sx2 qP s sb0 = s2 1 X̄ 2 + n (n − 1)sx2 ei2 /(n − 2) is the estimator for σ = σε . Hence, sb1 = σ̂b1 and sb0 = σ̂b0 are estimated coefficient sd’s. A high level of info/precision/accuracy means small sb values. 64 Normal and Student’s t Recall what Student discovered: If θ ∼ N(µ, σ 2 ), but you estimate σ 2 ≈ s 2 based on n − p degrees of freedom, then θ ∼ tn−p (µ, s 2 ). For example: I I Ȳ ∼ tn−1 (µ, sy2 /n). b0 ∼ tn−2 β0 , sb20 and b1 ∼ tn−2 β1 , sb21 The t distribution is just a fat-tailed version of the normal. As n − p −→ ∞, our tails get skinny and the t becomes normal. 65 Standardized Normal and Student’s t We’ll also usually standardize things: bj − βj bj − βj ∼ N(0, 1) =⇒ ∼ tn−2 (0, 1) σbj sbj We use Z ∼ N(0, 1) and Zn−p ∼ tn−p (0, 1) to represent standard random variables. Notice that the t and normal distributions depend upon assumed values for βj : this forms the basis for confidence intervals, hypothesis testing, and p-values. 66 Testing and Confidence Intervals (in 3 slides) Suppose Zn−p is distributed tn−p (0, 1). A centered interval is P(−tn−p,α/2 < Zn−p < tn−p,α/2 ) = 1 − α 67 Confidence Intervals Since bj ∼ tn−p (βj , sbj ), bj − βj < tn−p,α/2 ) sbj = P(bj − tn−p,α/2 sbj < βj < bj + tn−p,α/2 sbj ) 1 − α = P(−tn−p,α/2 < Thus (1 − α)*100% of the time, βj is within the Confidence Interval: bj ± tn−p,α/2 sbj 68 Testing Similarly, suppose that assuming bj ∼ tn−p (βj , sbj ) for our sample bj leads to (recall Zn−p ∼ tn−p (0, 1)) bj − βj bj − β j = ϕ. + P Zn−p > P Zn−p < − sbj sbj Then the “p-value” is ϕ = 2P(Zn−p > |bj − βj |/sbj ). You do this calculation for βj = βj0 , an assumed null/safe value, and only reject βj0 if ϕ is too small (e.g., ϕ < 1/20). In regression, βj0 = 0 almost always. 69 More Detail... Confidence Intervals Why should we care about Confidence Intervals? I The confidence interval captures the amount of information in the data about the parameter. I The center of the interval tells you what your estimate is. I The length of the interval tells you how sure you are about your estimate. 70 More Detail... Testing Suppose that we are interested in the slope parameter, β1 . For example, is there any evidence in the data to support the existence of a relationship between X and Y? We can rephrase this in terms of competing hypotheses. H0 : β1 = 0. Null/safe; implies “no effect” and we ignore X . H1 : β1 6= 0. Alternative; leads us to our best guess β1 = b1 . 71 Hypothesis Testing If we want statistical support for a certain claim about the data, we want that claim to be the alternative hypothesis. Our hypothesis test will either reject or not reject the null hypothesis (the default if our claim is not true). If the hypothesis test rejects the null hypothesis, we have statistical support for our claim! 72 Hypothesis Testing We use bj for our test about βj . I Reject H0 when bj is far from βj0 (ususally 0). I Assume H0 when bj is close to βj0 . An obvious tactic is to look at the difference bj − βj0 . But this measure doesn’t take into account the uncertainty in estimating bj : What we really care about is how many standard deviations bj is away from βj0 . 73 Hypothesis Testing The t-statistic for this test is bj − βj0 bj zbj = = for βj0 = 0. sbj sb j If H0 is true, this should be distributed zbj ∼ tn−p (0, 1). I Small |zbj | leaves us happy with the null βj0 . I Large |zbj | (i.e., > about 2) should get us worried! 74 Hypothesis Testing We assess the size of zbj with the p-value : ϕ = P(|Zn−p | > |zbj |) = 2P(Zn−p > |zbj |) (once again, Zn−p ∼ tn−p (0, 1)). 0.2 -z z 0.0 p(Z) 0.4 p-value = 0.05 (with 8 df) -4 -2 0 2 4 Z 75 Hypothesis Testing The p-value is the probablity, assuming that the null hypothesis is true, of seeing something more extreme (further from the null) than what we have observed. You can think of 1 − ϕ (inverse p-value) as a measure of distance between the data and the null hypothesis. In other words, 1 − ϕ is the strength of evidence against the null. 76 Hypothesis Testing The formal 2-step approach to hypothesis testing I Pick the significance level α (often 1/20 = 0.05), our acceptable risk (probability) of rejecting a true null hypothesis (we call this a type 1 error). This α plays the same role as α in CI’s. I Calculate the p-value, and reject H0 if ϕ < α (in favor of our best alternative guess; e.g. βj = bj ). If ϕ > α, continue working under null assumptions. This is equivilent to having the rejection region |zbj | > tn−p,α/2 . 77 Example: Hypothesis Testing Consider again a CAPM regression for the Windsor fund. Does Windsor have a non-zero intercept? (i.e., does it make/lose money independent of the market?). H0 : β0 = 0 and there is no-free money. H1 : β0 6= 0 and Windsor is cashing regardless of market. 78 Hypothesis Testing – Windsor Fund Example -"./(($01"$!)2*345$5"65"33)42$72$8$,9:;< Example: Hypothesis Testing b! sb! b! sb! Applied Regression Analysis Carlos M. Carvalho !""#$%%%&$'()*"$+, It turns out that we reject the null at α = .05 (ϕ = .0105). Thus Windsor does have an “alpha” over the market. 79 Example: Hypothesis Testing Looking at the slope, this is a very rare case where the null hypothesis is not zero: H0 : β1 = 1 Windsor is just the market (+ alpha). H1 : β1 6= 1 and Windsor softens or exaggerates market moves. We are asking whether or not Windsor moves in a different way than the market (e.g., is it more conservative?). Now, t= −0.0643 b1 − 1 = = −2.205 sb1 0.0291 tn−2,α/2 = t178,0.025 = 1.96 Reject H0 at the 5% level 80 Forecasting The conditional forecasting problem: Given covariate Xf and sample data {Xi , Yi }ni=1 , predict the “future” observation yf . The solution is to use our LS fitted value: Ŷf = b0 + b1 Xf . This is the easy bit. The hard (and very important!) part of forecasting is assessing uncertainty about our predictions. 81 Forecasting If we use Ŷf , our prediction error is ef = Yf − Ŷf = Yf − b0 − b1 Xf 82 Forecasting This can get quite complicated! A simple strategy is to build the following (1 − α)100% prediction interval: b0 + b1 Xf ± tn−2,α/2 s A large predictive error variance (high uncertainty) comes from I Large s (i.e., large ε’s). I Small n (not enough data). I Small sx (not enough observed spread in covariates). I Large difference between Xf and X̄ . Just remember that you are uncertain about b0 and b1 ! Reasonably inflating the uncertainty in the interval above is always a good idea... as always, this is problem dependent. 83 Forecasting For Xf far from our X̄ , the space between lines is magnified... 84 Glossary and Equations I Ŷi = b0 + b1 Xi is the ith fitted value. I ei = Yi − Ŷi is the ith residual. I s: standard error of regression residuals (≈ σ = σε ). s2 = I 1 X 2 ei n−2 sbj : standard error of regression coefficients. s sb1 = s2 (n − 1)sx2 s sb0 = s 1 X̄ 2 + n (n − 1)sx2 85 Glossary and Equations I α is the significance level (prob of type 1 error). I tn−p,α/2 is the value such that for Zn−p ∼ tn−p (0, 1), P(Zn−p > tn−p,α/2 ) = P(Zn−p < −tn−p,α/2 ) = α/2. I zbj ∼ tn−p (0, 1) is the standardized coefficient t-value: zbj = I bj − βj0 sbj (= bj /sbj most often) The (1 − α) ∗ 100% for βj is bj ± tn−p,α/2 sbj . 86