Download General Regression Formulae

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Choice modelling wikipedia , lookup

Regression toward the mean wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
General
Regression
Z Y i = β ZX i + ε i
Single Predictor Standardized Parameter Model:
ZYi = β ZXi
Single Predictor Standardized Statistical Model:
β = rYX
Estimate of Beta (Beta-hat):
sZy.Zx =
Standard error of estimate:
(1)
1 - rY2X
(2)
(1 - rY2X)
(N-2)
Seβ =
Standard error of Beta:
Formulae
There are two identical null hypotheses:
(3)
Ho: β = 0 and Ho: ρ = 0
Both are tested with a t-statistic with (N - 2) degrees of freedom (df) which can be
computed two ways.
t (N
- 2)
(N -2)
=r
1-
(4)
rY2X
t (N
and
-2)
=
β - 0
Seβ
(5)
___________________________________________________________________________
Single Predictor Raw Score Parameter Model:
Yi = α + β X i + ε i
Single Predictor Raw Score Statistical Model:
Y = a + b1 X 1
Estimate of Beta (b):
b=β
sY
sX
(6)
Since Beta-hat and r are identical in the single predictor model r can be
substituted.
Estimate of the Y-intercept or Regression constant (a):
Standard error of estimate:
Standard error of b:
sY.X = sY
Se b =
a = Y - bX
1 - rY2X
2
sY.X
(N-1)s X2
There are two identical null hypotheses:
(7)
(8)
(9)
Ho: β = 0 and Ho: ρ = 0
Both are tested with a t-statistic with (N - 2) degrees of freedom (df) which can be
computed two ways. Again with formula (4) and with
t (N
-2)
=b - 0
Seb
(10)
Two Predictor Standardized Parameter Model: Z Y i = β 1 ZX1i + β 2 ZX2i + ε i
ZYi = β1 ZX1i + β2 ZX2i
Two Predictor Standardized Statistical Model:
To calculate Beta-hat the correlation between the predictor variables must be taken
into consideration
r - rY 2 r1 2
β1 = Y 1
(11)
1 - r12 2
r - rY 1 r1 2
β2 = Y 2
and
1 - r12 2
(12)
Similar to formula (2), the Standard error of estimate is:
2
s Zy.Zy =
1 - RYX
(13)
In the two predictor case the Standard error of Beta-hat is the same for both
2
(1 - RY.12)
Seβ =
variables:
(14)
r21 2)
(N-3)(1-
However, there is more than one null hypothesis that can be tested.
First of all, one can test whether the overall model significantly improves prediction
over the mean. Ho: β 1 = β 2 = 0
This is tested with an F-statistic with two (number of predictors) and (N-3) dfs :
( R 2 - 0)/2
F (2,N-3) =
(15)
2
(1 - R )/(N - 3)
Multiple R 2 has a general formula:
2
RY =
K
∑
β j rYj = β1 rY1 + β 2 rY2 + . . . + β k rYk
(16)
j =1
One may also test whether each predictor makes a significant improvement in
prediction over the other predictor(s). This is tested with a t-test with (N - k -1)
degrees of freedom, where k equals the number of predictors (in this case k =2).
t (N
For any variable j:
2
where
Seβj =
- k -1)
=
βj - 0
Seβj
(1 - RY.12
(N-k-1)(1-
. . . k)
2
R j.12 . . . k)
(17)
(18)
This can also be tested with a more flexible F-statistic:
2
F (k F
- kR ,N-k F -1)
=
2
(RF - RR )/(k F - kR )
2
(1 - RF )/(N - kF -1)
(19)
Two Predictor Raw Score Parameter Model: Yi = α + β 1 X 1 i + β 2 X 2 i + ε i
Two Predictor Raw Score Statistical Model:
Y = a + b1 X 1 + b2 X 2
s
For any variable j, the Estimate of Beta (bj ): b j= β j Y
sXj
(20)
Estimate of the Y-intercept or Regression constant (a):
Similar to formula (8), the Standard error of estimate:
a = Y - b1 X 1 - b2 X2 ( 2 1 )
sY.Y = sY 1 - R2
(22)
Because of possible differences in variance across variables, each predictor variable
has a different Standard error of b:
2
For any of the two variables denoted as j:
Se b j =
sY.Y
s j2 (N-1)(1-
r21 2)
(23)
Again, one can test whether the overall model significantly improves prediction
over the mean. Ho: β 1 = β 2 = 0, which is tested with the F -statistic in formula (15).
Also similar to the standardized model, one may also test whether each predictor
makes a significant improvement in prediction over the other predictor(s). This is
tested with a t-test with (N - k -1) degrees of freedom, where k equals the number of
predictors (in this case k =2).
For any variable j: t (N - k -1) =
bj - 0
Se b j
(24)
__________________________________________________________________________
Partial Correlations are used to statistically "control" the effects of all other
predictors.
Partial correlations remove the effect of control variables from variables
of interest including the dependent variable. Some researchers use them instead of
Beta-hat to interpret variable "importance."
With one dependent variable (Y) and two predictors, the general formula is:
r Y1.2 =
r Y 1 - rY 2 r1 2
1 -
r12 2
1 -
(25)
rY2 2
Semi-Partial (sometimes referred to as Part) correlation are an index of the "unique"
correlation between variables. Semi-Partial correlations remove the effect of a
variable from all other predictors but not the dependent variable.
With one dependent variable (Y) and 2 predictors, the general formula is:
r Y(1.2) =
r Y 1 - rY 2 r1 2
1 -
r12 2
Squaring Semi-partial correlations are useful because they give the "unique"
contribution a variable makes to the R 2 of a multiple regression model.
For example with two predictors R 2 can be decomposed as follows:
(26)
2
RY.12 = rY2 2 + r2Y(1.2)
and conversely,
2
RY.12 = rY2 1 + r2Y(2.1)
Source Table for Multiple Regression
Although this process would be laborious, this is the conceptual derivation for the
F -ratio in Multiple Regression
Source
Regression
(Explained Variance)
Residual
(Error Variance)
Sum of Squares
df
∑ (Yi - Y )
k
SSR /k
N -k -1
SSe /df e
∑ (Y i
Mean Squares
F
2
- Y i) 2
MS R /MS e
∑ (Y i - Y )
N -1
s2 = SST /N-1
Total Variance
______________________________________________________________________________
where, N = total number of cases, k = number of predictors, Y = the mean of Y .
Y i = each individual score on Y , and Y i = each individual predicted Y .
2
Given, R2 = SSR /SS T
______________________________________________________________________________
Sum of Squares
df
MS
F
Source
k
S S R /k
(R2 /k )
Regression
R2 SST
2
(Explained Variance)
(1- R )/(N - k - 1)
Residual
(Error Variance)
(1- R2 ) SST
N -k -1
S S e /N - k - 1
∑ (Y i - Y )
N -1
s2 = SST /N-1
Total Variance
______________________________________________________________________________
2
One-Way ANOVA Source Table
When we extend Least Squares Regression Methodology to a continuous dependent
variable Y and categorical independent variables, it is often referred to as the
ANalysis Of Varaince (ANOVA). In the ANOVA, the predicted score, Y i , for each
individual in the jth group is equal to their (jth) group mean, Y i = Y j . Knowing this,
the previous Source Tables simplify greatly. For the One-way (one categorical
independent variable) ANOVA, the Source Table is as follows:
______________________________________________________________________________
Sum of Squares
df
Mean Squares
F
Source
Between Groups
(Explained Variance)
Within Groups
(Error Variance)
Total Variance
∑ n j( Y j - Y *)2
J -1
SSB /J - 1
∑ (Y i
- Y j) 2
N-J
SSW /df W
∑ (Y i
- Y *)2
N -1
MS B/MS W
s2 = SST /N-1
where, N = total number of cases, J = number of groups, Y * = the grand mean of Y
across all groups. Y i = each individual score on Y , and Y j = the mean for group j.
n j = the number of cases in group j.
R2 = η 2= SSB/SST .