Download Appendix 1 : Regression and individual conditional

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Least squares wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Annexe à la partie 2 : Regression and individual conditional performance
1. The general model
Assume a general linear model
Yi  a  bSi  cX i  ui ,
(1)
where Y is the dependent variable, a is a constant, Xi is the independent variable which will
be graphed in the Y/X space and held constant for a given observation i, S is a vector of other
independent variables, u is an error term and b is a coefficient, c is a vector of coefficients and
i is an observation.
The regression estimate by OLS of model (1) can be written
Yˆi  aˆ  bˆX i  cˆSi ,
(2)
where variables and parameters with a ^ are estimated. Then the difference between the
observed variable Y and the estimated variable Yˆ is the estimated residual
uˆi  Yi  Yˆi .
(3)
We are interested in the relation between Y and X, holding S constant; thus in the Y,X space
we draw the conditional regression line
~
Yi  aˆ  bˆX i  cˆS ,
(4)
where S is the average of the Si. In order to show the relation between the observations of Y
and the estimated relation between Y and X conditional upon S given by the line (4), we
define a conditional value of Y :
Yic  Yi  aˆ  cˆSi .
(5)
c
This conditional observation Yi is used and graphed e.g. by Barro (1991)1 when he shows in
his figure II, the relation between the growth rate of GDP (Y) and the level of GDP per capita
in 1960 (X) conditional on the level of education (S) of country i.
What happens in Barro’s graph, is mainly a downward movement of the observations of
high GDP countries in the space, because the education effect (variable S) is substracted from
the growth rate on the Y-axis (equation 5). The variable X on the X-axis is the initial level of
GDP. Barro R., 1991 “Economic Growth in a Cross Section of Countries”, The quarterly Journal of
1
Economics, 106(2): 407-443.
2. Comparison with the biased model
The conditional observation will differ from the original observation Yi and from a biased
line estimated as

 
Yi  a  b X i
and going trough the unconditional observations in the Y/X space.
3. Individual performance analysis
The conditional observation will also be located at a distance of the graphed line Y~i as given
by (4) and (5):
~
Yic  Yi  Yi  cˆ(Si  S )  2aˆ  bˆX i ,
(6)
or using (3) and (2) :
~
Yic  Yi  uˆi  aˆ  cˆS .
(7)
~
c
A confidence interval can be computed around Yi . When the conditional value Yi is outside
the confidence interval, then for observation i, there is a strong divergence with the sample for
a given value of Xi. This divergence is not explained by the specific value of the vector Si for
observation i. This divergence deserves case-study analysis before being simply attributed to
good or bad luck.
The difference between the observed value of Y and the graphed line can also be computed.
This difference is
~
Yi  Yi  uˆi  cˆ(Si  S ) ,
(8)
and this includes the effect of the specific value of S on the performance Y of observation i.
To separate the effect of the variables S which can be controlled or at least explained, from
the residual u, the following difference is useful :
Yi  Yic  aˆ  cˆSi .
(9)
The decompositions (8) and (9) are useful in performance analysis. An institution which has
a relatively poor absolute performance on an indicator Y, can claim that this is due to its
mission to serve a difficult target group Xi, say of poor or disadvantaged people2. Holding
Xi constant and given the group of institutions to which it is compared, the performance of
~
this institution should be Yi . A value above this is a better than expected performance, a
value below is a worse than expected performance. It is possible to use (8) to decompose the
difference into unexplained residual effects ui and other policy choices ( Si  S ) . Of
course, these differences must be subjected to statistical significance tests.
The relations between the observed (1), the estimated (2), the conditional (5) and the
conditional estimated (4) values of Y are presented in the graph below. The relative positions
of the Y’s is purely arbitrary, because it actually depends upon the signs of
ui , cˆ and ( Si  S ) .
2
Imagine a relation between financial performance (FSS on the Y-axis) and initial wealth of the borrowers (on
the X-axis). Some institutions serving poor borrowers may nevertheless have a good performance due to
efficient monitoring (measured by the frequency of instalments, variable S) or to good luck (residual u).
Y
uˆi  aˆ  c( X i  X )
Yic
~
Yi
~
Y  aˆ  bˆX i  cˆS
 aˆ  cˆX i
ûi
aˆ  cˆS
Yi
0
Si
Graph A.1. Performance analysis of Y for i given X and/or S.
S
Barro (1991) : Figure 1 : the biased regression and unconditional observations.
Barro (1991) Figure 2 : the conditional regression and the conditional observations