Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Types of artificial neural networks wikipedia , lookup
History of statistics wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Confidence interval wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Inference for Regression Inference about the Regression Model and Using the Regression Line Chapter 10, Guan ch10, 11 Research question Do we have negative demand curve? Chen et al (2006) JPE P P1 P2 P3 Q P 10 10 10 10 7 7 7 7 4 4 4 4 Q 2 3 4 5 4 5 6 10 5 9 15 20 What is the demand elasticity, (dQ/Q)(dP/P)? Research question Does garbage predict stock return? “Asset Pricing with Garbage (Journal of Finance, 2011)”, by Alexi Savov. Let d(garbage) is the annual growth in total garbage, One % increases in garbage, we get 2.44% increases in stock return. Objectives Inference about the regression model and using the Regression Line Statistical model for simple linear regression Estimating the regression parameters Conditions for regression inference Confidence intervals and significance tests Inference about correlation Confidence and prediction intervals for µy Standard errors Analysis of variance for regression Some advanced issues yˆ 0.125x 41.4 The data in a scatterplot are a random sample from a population that may exhibit a linear relationship between x and y. Different sample different plot. Now we want to describe the population mean response my as a function of the explanatory variable x: my = b0 + b1x… and to assess whether the observed relationship is statistically significant (not entirely explained by chance events due to random sampling). Some terminations Y variable (應變數; 因變數; 被解釋變數; 反應變數) Dependent variable Response Regressand X variable (自變數; 解釋變數) Independent variable Explanatory variable Ordinary least squares (普通最小平方法) Residual (殘差) Statistical model for simple linear regression In the population, the linear regression equation is my = b0 + b1x. Sample data then fits the model: Data = fit + residual y i = (b 0 + b 1 x i ) + (ei) where the ei are identical, independent and normally distributed N(0,s). (we called i.i.d~N(0,s)) Linear regression assumes equal variance of y (s is the same for all values of x). Estimating the regression parameters my = b0 + b1x The intercept (截距) b0, the slope (斜率) b1, and the standard deviation s of y are the unknown parameters of the regression model. We rely on the random sample data to provide unbiased estimates of these parameters. The value of ŷ from the least-squares regression line is really a prediction of the mean value of y (my) for a given value of x. The least-squares regression line (ŷ = b0 + b1x) obtained from sample data is the best estimate of the true population regression line (my = b0 + b1x). ŷ unbiased estimate for mean response my b0 unbiased estimate for intercept b0 b1 unbiased estimate for slope b1 The population standard deviation s for y at any given value of x represents the spread of the normal distribution of the ei around the mean my . The regression standard error, s, for n sample data points is calculated from the residuals (yi – ŷi): s 2 residual n2 2 ˆ ( y y ) i i n2 s is an unbiased estimate of the regression standard deviation s. Conditions for regression inference The observations are independent. The relationship is indeed linear. β0+β1X2 is a linear model (linear in parameters) β0+ X β1 is not a linear model (non-linear in parameters) The standard deviation of y, σ, is the same for all values of x. The response y varies normally around its mean. Using residual plots to check for regression validity The residuals (y − ŷ) give useful information about the contribution of individual data points to the overall pattern of scatter. We view the residuals in a residual plot: If residuals are scattered randomly around 0 with uniform variation, it indicates that the data fits a linear model, and has normally distributed residuals for each value of x and constant standard deviation σ. Residuals are randomly scattered good! Curved pattern the relationship is not linear. Change in variability across plot σ not equal for all values of x. What is the relationship between job stress and locus of control (LOC,內/ 外控量表)? We plot stress level against LOC for a random sample of 100 accountants. The relationship is approximately linear with no outliers. Residual plot: The spread of the residuals is reasonably random—no clear pattern. The relationship is indeed linear. Normal quantile plot for residuals: The plot is fairly straight, supporting the assumption of normally distributed residuals. Data okay for inference. Look into more details about statistical properties and inference Ordinary Least Squares Method Find β0 and β1 to minimize the errors. We take the squared errors to avoid the positive and negative values cancelling out each other. Objection function: 1 n 1 n 2 2 Q( b0 , b1 ) (Yi b 0 b1 X i ) U i . n i 1 n i 1 The OLS is to find the line making sum squared error minimum. We then get first order conditions as: 1 n Q ( b 0 , b1 ) 2 (Yi b 0 b1 X i ) 0, b 0 n i 1 1 n Q ( b 0 , b1 ) 2 (Yi b 0 b1 X i ) X i 0. b1 n i 1 Ordinary Least Squares Method Upon first order conditions, we solve ordinary least squares estimator (最小平方估計式, OLS estimator) n b1,n ( X i 1 i X n )(Yi Yn ) n 2 ( X X ) i n , i 1 b 0, n Yn b n X n . What happens if X=X0, which is a consistent term? What happen if we restrict β0=0? Regression without intercept Yˆ bˆ1,n X bn Yˆ bˆ0,n bˆ1,n X estimated regression line (估計的迴歸線) an Some other expressions for OLS estimator Β1 could be also expressed as: n b1,n (X i 1 i X n )(Yi Yn ) n 2 ( X X ) i n i 1 n XY i 1 n i i n (X i 1 n i 2 ( X X ) i n i 1 nX nYn 2 X i nX n 2 i 1 Xi X n K iYi where K i i 1 n 2 ( X X ) j i j X n )Yi Some properties under OLS Given Uˆ n i ^ U i Yi Yˆi , we have following properties from F.O.C 0. i 1 n X Uˆ i i 1 n YˆUˆ i 1 i i i 0 0 The estimated regression line must passes through a n , bn Confidence interval for regression parameters Estimating the regression parameters b0, b1 is a case of one-sample inference with unknown population variance. We rely on the t distribution, with n – 2 degrees of freedom. A level C confidence interval for the slope, b1 is: b1 ± t* SEb1 A level C confidence interval for the intercept, b0 is: b0 ± t* SEb0 t* is the t critical for the t (n – 2) distribution with area C between –t* and +t*. What are SEb1 and SEb0, we leave the problem to later sections. Significance test for the slope We can test the hypothesis H0: b1 = 0 versus a one- or two-sided alternative. We calculate t = b1 / SEb1 which has the t (n – 2) distribution to find the p-value of the test. Note: Software typically provides two-sided p-values. Testing the hypothesis of no relationship We may look for evidence of a significant relationship between variables x and y in the population from which our data were drawn. For that, we can test the hypothesis that the regression slope parameter β is equal to zero. H0: β1 = 0 vs. H0: β1 ≠ 0 s y Testing H0: β1 = 0 also allows us to test the hypothesis of slope b1 r sx no correlation between x and y in the population. Note: A test of hypothesis for b0 is irrelevant (b0 is often not even achievable). Using technology Computer software runs all the computations for regression analysis. Here is some software output for the stress/LOC example. Slope Intercept p-value for tests of significance The t-test for regression slope is highly significant (p = 0.002). There is a significant relationship between stress and LOC. “intercept”: intercept “LOC”: slope P-value for tests of significance confidence intervals Inference for correlation To test for the null hypothesis of no linear association, we have the choice of also using the correlation parameter ρ. When x is clearly the explanatory variable, this test is equivalent to testing the hypothesis H0: β = 0. b1 r sy sx When there is no clear explanatory variable (e.g., arm length vs. leg length), a regression of x on y is not any more legitimate than one of y on x. In that case, the correlation test of significance should be used. When both x and y are normally distributed, H0: ρ = 0 tests for no association of any kind between x and y—not just linear associations. The test of significance for ρ uses the one-sample t-test for: H0: ρ = 0. We compute the t statistics for sample size n and correlation coefficient r. The p-value is the area under t (n – 2) for values of T as extreme as t or more in the direction of Ha: t r n2 1 r2 Confidence Interval for Regression Response Using inference, we can also calculate a confidence interval for the population mean μy of all responses y when x takes the value x* (within the range of data tested): This interval is centered on ŷ, the unbiased estimate of μy. The true value of the population mean μy at a given value of x will indeed be within our confidence interval in C% of all intervals calculated from many different random samples. The level C confidence interval for the mean response μy at a given value x* of x is: t* is the t critical for the t (n – 2) y^ ± tn − 2 * SEm^ distribution with area C between –t* and +t*. We again leave the formula to later section A separate confidence interval is calculated for μy along all the values that x takes. Graphically, the series of confidence intervals is shown as a continuous interval on either side of ŷ. 95% confidence interval for my Prediction interval for regression response One use of regression is for predicting the value of y, ŷ, for any value of x within the range of data tested: ŷ = b0 + b1x. But the regression equation depends on the particular sample drawn. More reliable predictions require statistical inference: To estimate an individual response y for a given value of x, we use a prediction interval. If we randomly sampled many times, there would be many different values of y obtained for a particular x following N(0, σ) around the mean response µy. The level C prediction interval for a single observation on y when x takes the value x* is: t* is the t critical for the t (n – 2) ŷ ± t*n − 2 SEŷ distribution with area C between –t* and +t*. The prediction interval represents mainly the error from the normal distribution of the residuals ei. Graphically, the series confidence interval is shown as a continuous interval on either side of ŷ. 95% prediction interval for ŷ 1918 flu epidemics 500 400 300 200 100 17 ee k 15 w ee k w w ee k 11 9 w ee k ee k 7 w ee k w ee k w ee k w ee k w 5 1918 influenza epidemic 13 0 800 700 600 500 400 300 of 200 100 0 # Deaths 17 w ee k 15 w ee k 13 w ee k # Cases k 11 9 ee w w ee k 7 w ee k 5 w ee k 3 k w ee k 1 We look at the relationship between the number of deaths in a given week and the number of new diagnosed cases one week earlier. # deaths reported 600 10000 9000 8000 # Cases # Deaths 7000 6000 The line graph suggests that 7 to 8% of those 5000 4000 diagnosed with the flu died within about a week 3000 2000 their diagnosis. 1000 0 ee 0 0 130 552 738 414 198 90 56 50 71 137 178 194 290 310 149 w 36 531 4233 8682 7164 2229 600 164 57 722 1517 1828 1539 2416 3148 3465 1440 700 Incidence week 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 800 1 1918 influenza epidemic Date # Cases # Deaths 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 3 # cases diagnosed 1918 influenza epidemic 1918 flu epidemic: Relationship between the number of r = 0.91 deaths in a given week and the number of new diagnosed cases one week earlier. EXCEL Regression Statistics Multiple R 0.911 R Square 0.830 Adjusted R Square 0.82 Standard Error 85.07 s Observations 16.00 Coefficients Intercept 49.292 FluCases0 0.072 b1 St. Error 29.845 0.009 SEb1 t Stat 1.652 8.263 P-value Lower 95% Upper 95% 0.1209 (14.720) 113.304 0.0000 0.053 0.091 P-value for H0: β1 = 0 P-value very small reject H0 β1 significantly different from 0 There is a significant relationship between the number of flu cases and the number of deaths from flu a week later. SPSS CI for mean weekly death count one week after 4000 flu cases are diagnosed: µy within about 300–380. Prediction interval for a weekly death count one week after 4000 flu cases are diagnosed: ŷ within about 180–500 deaths. Least squares regression line 95% prediction interval for ŷ 95% confidence interval for my Standard errors To estimate the parameters of the regression, we calculate the standard errors for the estimated regression coefficients. The standard error of the least-squares slope β1 is: SEb1 s ( xi xi ) 2 The standard error of the intercept β0 is: SEb 0 1 x2 s n ( xi xi ) 2 To estimate or predict future responses, we calculate the following standard errors. The standard error of the mean response µy is: The standard error for predicting an individual response ŷ is: Analysis of variance for regression The regression model is: Data = fit + residual y i = (b 0 + b 1 x i ) + (ei) where the ei is identical, independent and normally distributed N(0,s), and s is the same for all values of x. It resembles an ANOVA, which also assumes equal variance, where SST = SS model + DFT = DF model + SS error DF error and Goodness of fit (配適度) Different independent (X) variable may have different explanatory power on dependent variable (Y). If we can find a better model in explaining the Y variation, then we can have better model interpretation. The method to select models is goodness of fit (配適度) For example, if the inflation rate on stock return has a better goodness of fit than GDP growth rate on stock return, then we will say inflation rate is a better model in explaining stock return pattern. Goodness of fit We look at the Y variation by detrend (get rid of mean of Y) Y iYn (Yˆi Uˆ i ) Yn (Yˆi Yn ) Uˆ i , Y iYn (Yˆi Yˆn ) Uˆ i , n n n n (Y iYn ) (Yˆi Yˆn ) Uˆ 2 (Yˆi Yˆn )Uˆ i , 2 i 1 n 2 i 1 i 1 n n i 1 i 1 2 i i 1 (Y iYn ) (Yˆi Yˆn ) 2 Uˆ i2 . 2 i 1 The first item is Sum of Squared Total (SST, or call TSS; 總平方和) The second item is Sum Squared Regressor (SSR, or RSS, or SSM; 迴 歸平方和) Some Econometricians call this term as SSE (Sum of Squared Explained), which makes thing confusing!! The third one is Sum of Squared Error (SSE; 殘差平方和) Some Econometricians call this term as SSR (Sum of Squared Residual), which makes thing confusing!! Degree of Freedom In SST, we use one degree of freedom in calculating sample mean, so df for SST is n-1. SSE comes from OLS, which requires two F.O.C, so we loss two degree of freedom. The fd for SSR is n-2. The difference between degrees of freedom for SST and SSR (or SSM) is df for SSE, which is 1. For a simple linear relationship, the ANOVA tests the hypotheses H0: β1 = 0 versus Ha: β1 ≠ 0 by comparing MSM (model) to MSE (error): F = MSM/MSE (only valid for simple regression model) When H0 is true, F follows the F(1, n − 2) distribution. The p-value is P(F > f ). The ANOVA test and the two-sided t-test for H0: β1 = 0 yield the same p-value. Software output for regression may provide t, F, or both, along with the p-value. ANOVA table Source Sum of squares SS Model (Regression) 2 ˆ ( y y ) i Error (Residual) ( y yˆ ) Total ( y y) i 2 DF Mean square MS F P-value 1 SSM/DFM MSM/MSE Tail area above F n−2 SSE/DFE i n−1 2 i SST = SSM + SSE DFT = DFM + DFE The standard deviation of the sampling distribution, s, for n sample data points is calculated from the residuals ei = yi – ŷi. s2 2 e i n2 2 ˆ ( y y ) i i n2 SSE MSE DFE s2 is an unbiased estimate of the regression variance σ2. Coefficient of determination, r2 The coefficient of determination, r2 (判定係數), square of the correlation coefficient, is the percentage of the variance in y (vertical scatter from the regression line) that can be explained by changes in x. r 2 = variation in y caused by x (i.e., the regression line) total variation in observed y values around the mean 2 ˆ ( y y ) i SSM r 2 ( yi y ) SST 2 Two types of coefficient of determination We have two types of r2 : Centered (置中) and Non-centered (非置中) coefficient of determination. When no indication, it usually refers to the former one. n Centered R 2 (Yi Yˆn ) n i 1 n (Yi Yn )2 1 i 1 n Non centered R 2 Uˆ i2 2 i 1 n (Yi Yn )2 i 1 Yˆi 2 Yi 2 i 1 i 1 n n 1 Uˆ i2 i 1 n Yi 2 i 1 . . How can we interpret the R2 R2 is higher, then the variation of Y is captured by X more, and with a better goodness of fit. As R2=1, there is no residual, we call it perfect fitted (完全配適). As R2=0 , SSE=SST, so there is no explanatory power at all. Some properties of R2 What if we let Y’=Y/100? Different expression: R 2 2 n ˆ ˆ (Yi Yn )(Yi Yn ) i1 n n 2 2 ˆ ˆ (Yi Yn ) (Yi Yn ) i1 i1 2 n ( X i X n )(Yi Yn ) i1 n n 2 2 (Yi Yn ) ( X i X n ) i1 i1 . Interpreting regression results Using software: SPSS r2 =SSM/SST = 2.157/22.114 ANOVA and t-test give same p-value. 1918 influenza epidemic 1918 influenza epidemic Date # Cases # Deaths 800 700 600 500 400 300 200 100 0 17 ee k 15 ee k 13 ee k 11 9 ee k ee k 7 w ee k 5 w ee k 3 w ee k w w ee k 1 1918 influenza epidemic w w w w 10000 800 9000 700 8000 600 # Cases # Deaths 7000 500 6000 5000 400 The 4000line graph suggests that about 7 to 8% of those 300 3000 diagnosed with the flu died within about a week200 of 2000 100 1000 diagnosis. their 0 0 # Deaths 17 w ee k 15 w ee k 13 w ee k # Cases k 11 9 ee w w ee k 7 w ee k 5 w ee k 3 k w ee k 1 We look at the relationship between the number of deaths in a given week and the number of new diagnosed cases one week earlier. ee 0 0 130 552 738 414 198 90 56 50 71 137 178 194 290 310 149 w 36 531 4233 8682 7164 2229 600 164 57 722 1517 1828 1539 2416 3148 3465 1440 Incidence week 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 # deaths reported # cases diagnosed 1918 flu epidemics r = 0.91 1918 flu epidemic: Relationship between the number of deaths in a given week and the number of new diagnosed cases one week earlier. MINITAB - Regression Analysis: FluDeaths1 versus FluCases0 The regression equation is FluDeaths1 = 49.3 + 0.0722 FluCases0 Predictor Coef Constant 49.29 FluCases 0.072222 S = 85.07 s MSE SE Coef SEb 0 0.008741 SE b1 29.85 R-Sq = 83.0% T P 1.65 0.121 8.26 0.000 R-Sq(adj) = 81.8% r2 = SSM / SST Analysis of Variance Source Regression DF 1 SS P-value for H0: β = 0; Ha: β ≠ 0 MS F P 68.27 0.000 Residual Error 14 494041 SSM 494041 101308 7236 Total 15 595349 SST MSE s 2 Relation between repurchase and book-to-market ratio: 8,434 repurchase announcement return using SAS. What do you find? Advanced Analyses on Regressions* BLUE (Best Linear Unbiased Estimator) Asymptotic Least Square Method MLE Conditional heteroskedasticity (條件變異數不齊一) Weighted Least Square Estimator (WLS estimator; 加權最小平方法) Statistical properties of OLS Specify linear regression as: Yi b X i U i , i 1,..., n. Two classical conditions of regression analysis: [B1] X i , i = 1, … , n, are non-stochastic variables [B2] α0 and β0 exist so that Y i = α 0 + β0 X i + V i , i = 1, … , n. Classical conditions Vi satisfies: (i) (Vi ) 0, i 1, ,n. (ii) var(Vi ) s 02 , i 1, ,n. 當 i j 時, cov(Vi ,V j ) 0. [B2] Vi is the error term under parameters 0 and b 0 , Vi is also known as disturbance (干擾項). When Vi has identical variance, Vi are under homoscedasticity (變異數齊一性); otherwise Vi are under heteroskedasticity (變異數不齊一性 ). Regression variance In [B2](ii), to estimate parameter σ02, the estimator under OLS is the Sum of Squared Error divided by degree of freedom: 1 n ˆ2 2 2 ˆ sˆ n U , s i n is regression variance . n 2 i 1 If we consider a model with only intercept, then the regression variance is exactly the sample variance: n n 1 1 2 2 ˆ sˆ n2 U ( Y Y ) . i i n n 1 i 1 n 1 i 1 Recall OLS estimators Given [B1] and [B2](i), OLS estimators are linear and unbiased estimators of α0 and β0 . Given [B1] and [B2], then s 2 bˆ var( bˆn ) s 02 1 n (X i 1 s 2ˆ 2 1 ˆ var( n ) s 0 ( n X n )2 i Xn n (X i 1 s ˆbˆ cov(ˆ n , bˆn ) s 02 i 2 ), X n )2 Xn n 2 ( X X ) i n i 1 統計學 ch11 簡單線性迴歸:統計分析 , . Best Linear Unbiased Estimator (BLUE) Assuming [B1] and [B2] held, Gauss-Markov theorem ( 高斯--馬可夫定理) tells us that OLS estimators are Best Linear Unbiased Estimator (BLUE; 最佳線性不偏估計式) 假設 [B1] 和 [B2] 成立,則 (Uˆ i ) 0, 2 ( X X ) 1 i n var(Uˆ i ) s 02 (1 ), i 1,, n, n 2 n j 1 ( X j X n ) ( X i X n )( X j X n ) 2 1 ˆ ˆ cov(U i ,U j ) s 0 ( ), i j. n 2 n k 1 ( X k X n ) Best Linear Unbiased Estimator 2 ˆ s Given [B1] and [B2], n is unbiased estimator of σ02 . Yet it is not best linear estimator. Given [B1] and [B2], parameter variance estimators are 2 unbiased estimators of s 2、 s 、 ˆ ˆ s ˆ ˆ b b 1 S sˆ , n 2 i1 ( X i X n ) 2 bˆ 2 n 2 Xn 2 2 1 Sˆ sˆ n ( ), n 2 n i 1 ( X i X n ) 2 ˆbˆ S Xn sˆ . n 2 i1 ( X i X n ) 2 n Restrict classical conditions of regression We did not require the distribution form for Vi. Now we add more restrictions and learn the real distribution of parameter estimators: [B2’] exists, under 0 and β0, we have Yi = 0 +β0 Xi + Vi, i = 1, …, n, where Vi are independent variables obeying N (0, σ02 ). If [B1] and [B2’] held, then bˆn ~ Ν ( b 0 , s 02 1 n i 1 (Xi Xn) 2 ), X n2 1 ˆ n ~ Ν ( 0 , s ( )), 2 n n i 1 ( X i X n ) 2 0 2 ( X X ) 1 i n Uˆ i ~ Ν (0, s (1 )). 2 n n j 1 ( X j X n ) 2 0 Restrict classical conditions of regression If [B1] and [B2’] are held, then .ˆ n and sˆ n2 are independnt , bˆn and sˆ n2 are independen t. (. n 2)sˆ n2 / s 02 ~ 2 (n 2). Without the normal distribution assumption, OLS estimators are not necessarily independent of variance estimators, χ2 distribution is not necessarily held. Under normal distribution assumption, OLS estimator is not only the BLUE but also the Best Unbiased Estimator. Asymptotic Least Squares We next release the condition of non-stochastic X [C1] {(X1,Y1), …, (Xn,Yn)} are random i.i.d. variables with finite variances. [C2] Given α0 and β0 so that Yi = α0 + β0 Xi + Vi, i = 1, … , n. , in which (i) (Vi | X i ) 0. (ii) var(Vi | X i ) s 02 , i 1, ,n. OLS under large sample Under [C1], we are able to apply weak law of large numbers and get: bˆn n i 1 ( X i X n )(Yi Yn ) /( n 1) n i 1 ( X i X n ) 2 /( n 1) s XY 2 . sX P s XY P ˆ ˆ n Yn b n X n mY 2 m X . sX [ in1 ( X i X n )(Yi Yn ) /( n 1)]2 2 s P XY R2 . n 2 n 2 2 2 [ i 1 (Yi Yn ) /( n 1)][ i 1 ( X i X n ) /( n 1)] s XsY P 2 sˆ n2 s Y2 (1 XY ). OLS under large sample When [C1] and [C2] (i) are held, β0 = σXY/σX2, α0 = μY – μXσXY/σX2. Under [C1] and [C2] , σ02 = σY2 (1 – ρ2XY). Therefore, parameters are determined by data (data variances and correlations) rather than the model setting. Again, we set regression model Yi = α +β Xi + Ui , then ˆ n are consistent Under [C1] and [C2] (i), OLS estimators bˆn and estimator of b 0 and 0 Under [C1] and [C2] , sˆ 2 is consistent estimator of σ02 . Asymptotic distribution of estimators Under [C1] , [C2] and a linear model (shown above), we get A n ( bˆ b ) ~ Ν (0, s 2 / s 2 ), n 0 0 X 2 m n (ˆ n 0 ) ~ Ν (0, s 02 (1 X2 )). sX A [C1] requires random sample, making random variables independent. If weal law of large numbers and central limit theorem are held, then consistency and asymptotic properties are held as well. Maximum Likelihood Estimation (MLE) Under [B1] and [B2]’, and given α0 and β0 so that Yi = α0 + β0 Xi + Vi, i = 1, … , n. then Yi ~ N(α0 + β0 Xi, σ02), Yi and Yj are independent, so we use maximum likelihood method ( 最大概似法) to estimator parameters: Let 0 : ( 0 , b0,s 02 ) {.Yi }in1 ~ N ( 0 b 0 X i , s 02 ), The maximum likelihood function is Maximum Likelihood Estimation We then tale nature logarithm of likelihood function, and get F.O.C. as n bn (X i 1 i X n )(Yi Yn ) , n 2 ( X X ) i n i 1 n Yn b n X n . Biased estimator 1 n ˆ2 1 n sˆ U i (Yi Yn ) 2 . n i 1 n i 1 2 n Conditional heteroskedasticity Conditional heteroskedasticity Now we further release the homoscedasticity assumption: [H1] {(X1, Y1), … , (Xn, Yn)} are i.i.d. random variables with at least forth moment. [H2] Given α0 and β0 so that Yi = α0 + β0 Xi + Vi, i = 1, … , n. , in which (V | X ) 0. i i We do not assume Var(Vi|Xi) to be irrelevant to Xi, so Var(Vi|Xi) could be a function of Xi under [H1] and [H2]. White (1980) adjustment One way to deal with the heteroskedasticity is White’s method. White’s test grew out of what are known as “White standard errors.” in his Econometrica (1980) paper. These are also sometimes called “robust standard errors” or “heteroskedasticity-consistent standard errors.” Under [H1] and [H2], we obtain mx in which Z i : ( X i m x )Vi and H i 1 Xi 2 E( X i ) White (1980) adjustment We get estimator variables (for t-test) as Where The estimator variances are consistent estimator under [h1] and [H2]. Weighted Least Square Estimator Another way to deal with the heteroskedasticity issue is Weighted Least Square (WLS) Method (加權最小平方法) Given [C1] and [C2](i), we have conditional variance heteroskedasticity with form of Var (Vi | X i ) h( X i ) h(.) is a given function. We can use following transformation to estimate parameters: where Weighted Least Square Estimator Upon OLS method on abovementioned transformation, we get WLS estimator: We can verify that transformed variables [C1] and [C2](i). Defining So the transformation follows conditional homoscedasticity. are satisfied with , then