* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download General Regression Formulae
Data assimilation wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Choice modelling wikipedia , lookup
Regression toward the mean wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression analysis wikipedia , lookup
General Regression Z Y i = β ZX i + ε i Single Predictor Standardized Parameter Model: ZYi = β ZXi Single Predictor Standardized Statistical Model: β = rYX Estimate of Beta (Beta-hat): sZy.Zx = Standard error of estimate: (1) 1 - rY2X (2) (1 - rY2X) (N-2) Seβ = Standard error of Beta: Formulae There are two identical null hypotheses: (3) Ho: β = 0 and Ho: ρ = 0 Both are tested with a t-statistic with (N - 2) degrees of freedom (df) which can be computed two ways. t (N - 2) (N -2) =r 1- (4) rY2X t (N and -2) = β - 0 Seβ (5) ___________________________________________________________________________ Single Predictor Raw Score Parameter Model: Yi = α + β X i + ε i Single Predictor Raw Score Statistical Model: Y = a + b1 X 1 Estimate of Beta (b): b=β sY sX (6) Since Beta-hat and r are identical in the single predictor model r can be substituted. Estimate of the Y-intercept or Regression constant (a): Standard error of estimate: Standard error of b: sY.X = sY Se b = a = Y - bX 1 - rY2X 2 sY.X (N-1)s X2 There are two identical null hypotheses: (7) (8) (9) Ho: β = 0 and Ho: ρ = 0 Both are tested with a t-statistic with (N - 2) degrees of freedom (df) which can be computed two ways. Again with formula (4) and with t (N -2) =b - 0 Seb (10) Two Predictor Standardized Parameter Model: Z Y i = β 1 ZX1i + β 2 ZX2i + ε i ZYi = β1 ZX1i + β2 ZX2i Two Predictor Standardized Statistical Model: To calculate Beta-hat the correlation between the predictor variables must be taken into consideration r - rY 2 r1 2 β1 = Y 1 (11) 1 - r12 2 r - rY 1 r1 2 β2 = Y 2 and 1 - r12 2 (12) Similar to formula (2), the Standard error of estimate is: 2 s Zy.Zy = 1 - RYX (13) In the two predictor case the Standard error of Beta-hat is the same for both 2 (1 - RY.12) Seβ = variables: (14) r21 2) (N-3)(1- However, there is more than one null hypothesis that can be tested. First of all, one can test whether the overall model significantly improves prediction over the mean. Ho: β 1 = β 2 = 0 This is tested with an F-statistic with two (number of predictors) and (N-3) dfs : ( R 2 - 0)/2 F (2,N-3) = (15) 2 (1 - R )/(N - 3) Multiple R 2 has a general formula: 2 RY = K ∑ β j rYj = β1 rY1 + β 2 rY2 + . . . + β k rYk (16) j =1 One may also test whether each predictor makes a significant improvement in prediction over the other predictor(s). This is tested with a t-test with (N - k -1) degrees of freedom, where k equals the number of predictors (in this case k =2). t (N For any variable j: 2 where Seβj = - k -1) = βj - 0 Seβj (1 - RY.12 (N-k-1)(1- . . . k) 2 R j.12 . . . k) (17) (18) This can also be tested with a more flexible F-statistic: 2 F (k F - kR ,N-k F -1) = 2 (RF - RR )/(k F - kR ) 2 (1 - RF )/(N - kF -1) (19) Two Predictor Raw Score Parameter Model: Yi = α + β 1 X 1 i + β 2 X 2 i + ε i Two Predictor Raw Score Statistical Model: Y = a + b1 X 1 + b2 X 2 s For any variable j, the Estimate of Beta (bj ): b j= β j Y sXj (20) Estimate of the Y-intercept or Regression constant (a): Similar to formula (8), the Standard error of estimate: a = Y - b1 X 1 - b2 X2 ( 2 1 ) sY.Y = sY 1 - R2 (22) Because of possible differences in variance across variables, each predictor variable has a different Standard error of b: 2 For any of the two variables denoted as j: Se b j = sY.Y s j2 (N-1)(1- r21 2) (23) Again, one can test whether the overall model significantly improves prediction over the mean. Ho: β 1 = β 2 = 0, which is tested with the F -statistic in formula (15). Also similar to the standardized model, one may also test whether each predictor makes a significant improvement in prediction over the other predictor(s). This is tested with a t-test with (N - k -1) degrees of freedom, where k equals the number of predictors (in this case k =2). For any variable j: t (N - k -1) = bj - 0 Se b j (24) __________________________________________________________________________ Partial Correlations are used to statistically "control" the effects of all other predictors. Partial correlations remove the effect of control variables from variables of interest including the dependent variable. Some researchers use them instead of Beta-hat to interpret variable "importance." With one dependent variable (Y) and two predictors, the general formula is: r Y1.2 = r Y 1 - rY 2 r1 2 1 - r12 2 1 - (25) rY2 2 Semi-Partial (sometimes referred to as Part) correlation are an index of the "unique" correlation between variables. Semi-Partial correlations remove the effect of a variable from all other predictors but not the dependent variable. With one dependent variable (Y) and 2 predictors, the general formula is: r Y(1.2) = r Y 1 - rY 2 r1 2 1 - r12 2 Squaring Semi-partial correlations are useful because they give the "unique" contribution a variable makes to the R 2 of a multiple regression model. For example with two predictors R 2 can be decomposed as follows: (26) 2 RY.12 = rY2 2 + r2Y(1.2) and conversely, 2 RY.12 = rY2 1 + r2Y(2.1) Source Table for Multiple Regression Although this process would be laborious, this is the conceptual derivation for the F -ratio in Multiple Regression Source Regression (Explained Variance) Residual (Error Variance) Sum of Squares df ∑ (Yi - Y ) k SSR /k N -k -1 SSe /df e ∑ (Y i Mean Squares F 2 - Y i) 2 MS R /MS e ∑ (Y i - Y ) N -1 s2 = SST /N-1 Total Variance ______________________________________________________________________________ where, N = total number of cases, k = number of predictors, Y = the mean of Y . Y i = each individual score on Y , and Y i = each individual predicted Y . 2 Given, R2 = SSR /SS T ______________________________________________________________________________ Sum of Squares df MS F Source k S S R /k (R2 /k ) Regression R2 SST 2 (Explained Variance) (1- R )/(N - k - 1) Residual (Error Variance) (1- R2 ) SST N -k -1 S S e /N - k - 1 ∑ (Y i - Y ) N -1 s2 = SST /N-1 Total Variance ______________________________________________________________________________ 2 One-Way ANOVA Source Table When we extend Least Squares Regression Methodology to a continuous dependent variable Y and categorical independent variables, it is often referred to as the ANalysis Of Varaince (ANOVA). In the ANOVA, the predicted score, Y i , for each individual in the jth group is equal to their (jth) group mean, Y i = Y j . Knowing this, the previous Source Tables simplify greatly. For the One-way (one categorical independent variable) ANOVA, the Source Table is as follows: ______________________________________________________________________________ Sum of Squares df Mean Squares F Source Between Groups (Explained Variance) Within Groups (Error Variance) Total Variance ∑ n j( Y j - Y *)2 J -1 SSB /J - 1 ∑ (Y i - Y j) 2 N-J SSW /df W ∑ (Y i - Y *)2 N -1 MS B/MS W s2 = SST /N-1 where, N = total number of cases, J = number of groups, Y * = the grand mean of Y across all groups. Y i = each individual score on Y , and Y j = the mean for group j. n j = the number of cases in group j. R2 = η 2= SSB/SST .