Download Lecture 4 Estimation in Multiple Regression Continued

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Estimation in the Two-Variable Regression
Model-- Continued
Assumptions of the Model
Theoretical Properties of OLS Estimators
Review of Related Statistical Concepts
Standard Errors of OLS Estimates
Properties of OLS Estimates...
in the CLR Model
 in the CNLR Model
Evaluating the Estimated SRF in Terms of:
Sign & Magnitude of Estimates & Sample Residuals
Standard Errors of Estimates
Coefficient of Determination
Assumptions of the Regression Model
• In regression analysis, we have two objectives:
– Estimating the PRF and…
– Drawing statistical inference about its parameters
• We also want our estimates to be reliable and
enjoy certain desirable statistical properties
that would make hem robust estimates.
• This requires us to make certain assumptions
about the PRF and the error term, u.
Assumptions of the Regression Model
 Linearity in Parameters and Error Term
 Non-random Independent Variables
 This means that we assume independent variables
are fixed in repeated sampling.
 Zero Mean Error
 In small samples this assumption might not
hold but as the sample size increases the mean
error tends to zero.
 As long as the regression model includes a
constant (intercept) term, this assumption is
automatically satisfied.
Assumptions of the Regression Model
 Constant or Homoscedastic Error Variance
 This requires all Y values to come from the
same distribution, one with a zero mean and
variance, ².
 If this is violated (i.e., if the error variance is
heteroscedastic) we lose precision when we
estimate the regression parameters.
Assumptions of the Regression Model
 No Serial- or Auto-correlation
 This means that ui and uj (i  j) in cross
sectional data, or ut and ut-1 in time
series data, are uncorrelated.
 In other words, given Xi , the error term
does not follow a systematic pattern.
 We distinguish between positive and
negative autocorrelation.
Assumptions of the Regression Model
 No Correlation Between X and u
 When we specify the PRF as a function of X
and u, we assume that these two variables
have separate and additive effects on Y.
 Nonzero Degrees of Freedom, n > k
 For each unknown parameter we estimate
we need at least one degree of freedom.
Nonzero Variance for Independent Variables
Assumptions of the Regression Model
 No Specification Error in the...
 choice of independent variables
 functional form of the regression equation
 probabilistic assumptions on u
 No Perfect Multicollinearity
11
No Measurement Error in Yi or Xi
12
The population error term, ui, follows the normal
distribution
Rationale for the Normality Assumption
Earlier, we said in regression analysis we have
dual objectives: Estimation and inferences.
•As far as estimation is concerned, all we need to
generate “good" estimates is assumptions 1-11.
•But in order to draw statistical inference (i.e., test
hypotheses,) we also need to make an assumption
regarding the probability distribution of the
population error term, ui.
Rationale for the Normality Assumption
•But why the normal distribution?
•Because it is the most general probability
distribution.
•Which is established by Central Limit Theorem.
The CLR and CNLR Models
Any regression model that satisfies assumptions
1-11 is known as a Classical Linear
Regression (CLR) Model, which is suitable for
estimation alone.
Any regression model that satisfies assumptions
1-12 is known as the Classical Normal Linear
Regression (CNLR) model, suitable not only
for estimation but also for testing hypotheses.
Probability Distribution of the Error Term &
the Dependent Variable in the CNLR Model
•Assumption 3 (zero mean error); assumption 4
(constant error variance); assumption 5 (no
autocorrelation), assumption 12 (normality) imply
that the ui are independently identically
distributed (iid) normally with zero mean and
finite variance, 2.
ui ~ n(0, 2)
Probability Distribution of the Error Term &
the Dependent Variable in the CNLR Model
•Because Yi is a linear function of ui, it follows that
Yi ~ n(1+2Xi, 2)
Probability Distribution of OLS Estimators
^ ^
1, 2,
•Recall that
…, k are linear functions of Yi ,
which has a normal distribution in CNLR model.
^
•This along with the fact that these estimators are
unbiased implies that
^
^
1 ~ n(1, Var(1))
^
^
2 ~ n(2, Var(2))
….………………
….………………
^
^
k ~ n(k, Var(k))
Standard Errors of Estimates
•Note the difference between standard error and
standard deviation.
•The former is a population concept whereas the
latter is a sample quantity.
Standard Errors of Estimates
•In the bivariate regression model, the true or
population variances of the OLS estimators are:
_
^
Var(ß2) = [1/(Xi - X)2]2
_
^
Var(ß1) = [Xi2/n(Xi - X)2]2
•Where 2 is the true or population variance of ui
(as well as Yi).
Standard Errors of Estimates
•Note the following features of
^
Var(ß1)
& Var(
^
ß2):
^
 Var(ß1) is directly proportional to the variance
of the dependent variable (2) but is inversely
proportional to the variance of X.
Because  is inversely related to the sample
size, n, the variances of the slope and
intercept terms are inversely related to the
sample size.
^2
Standard Errors of Estimates
Variance of the intercept term is directly
proportional to both the variance of Y and
Xi2but is inversely proportional to the
variance of Xi and the sample size, n.
^
^
ß2
 Note that because ß1 and are random, in any
given sample they may be dependent on each
other. This can be studied by looking at the
covariance between the two
^
^
_
^
Cov(ß1, ß2) = -X[Var(ß2)]
Standard Errors of Estimates in the
EViews Output
Dependent Variable: CONS
Method: Least Squares
Date: 09/06/99 Time: 16:07
Sample: 1959 1990
Included observations: 32
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
INCOME
91.88780
0.758849
10.69410
0.005049
8.592383
150.2949
0.0000
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.998674
0.998629
14.26803
6107.298
-129.4301
0.831129
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
1653.812
385.4033
8.214381
8.305989
22588.55
0.000000
Some Useful Probability Distributions
Related to the Normal Distribution
•The family of normal distributions include the t,
Chi-square (2), and F Distributions.
•Please review the properties of these distributions
in Appendix 5A, pp. 159-161 of Gujarati.
Theoretical Properties of OLSE
• “There is an infinite number of estimators for
estimating a parameter since there is an infinite
number of ways in which a sample of data can
be used to produce a linear estimate for an
unknown parameter.
• “Some of these are 'bad' estimators and some
are 'good'. What distinguishes an econometrician is the ability to produce 'good'
estimators, which in turn produce 'good'
estimates.
Theoretical Properties of OLSE
• “One of these 'good' estimators could be chosen
as the 'best' estimator and used to generate the
'best' estimate of the parameter in question.”
(Kennedy, P. 5)
Theoretical Properties of OLSE
• In econometrics 'goodness' of estimators is
measured in terms of certain desirable
statistical properties.
• Underlying theoretical properties of estimators
is the notion that estimators are random
variables and thus have sampling or
probability distributions.
Review of Statistics
• Desirable statistical properties of point estimators are
divided into two groups:
– small or finite sample properties
– large sample or asymptotic properties
• Small or finite sample properties include:
 Unbiasedness
 Minimum Variance
 Efficiency
 Minimum Mean-Square-Error (MSE)
Review of Statistics
• Large sample properties include:
 Asymptotic unbiasedness
 Consistency
Small or Finite Sample Properties:
Unbiasedness
^
•An estimator,  , is said to be an unbiased
estimator of the population parameter  if, in
repeated sampling, its expected or mean value
equals the true parameter,  ,
^
E( ) = 
or
^
^
Bias( ) = E( ) -  = 0
•Note that unbiasedness is a property of repeated
sampling, not of a single sample.
Small or Finite Sample Properties:
Minimum Variance
^
•An estimator,  , is said to be a minimum
estimator of  if it has the smallest variance
relative to all other estimators of .
•Note that a minimum variance estimator may or
may not be an unbiased estimator.
Small or Finite Sample Properties:
Efficiency
^
•An estimator,  , is said to be an efficient
estimator of  if it is both unbiased and has
minimum variance in the class of all unbiased
estimators.
•Thus efficiency is equivalent to minimumvariance unbiased or best unbiased estimator
(BUE).
Small or Finite Sample Properties:
Best Linear Unbiased Estimator (BLUE)
^
•An estimator,  , is said to be the best linear
unbiased estimator (BLUE) of the true population
parameter,  , if it is...
— linear
— unbiased
— minimum variance in the class of all linear
unbiased estimators of .
•Thus BLUE is equivalent to linear and efficient.
•Note that BLUE is a weaker property than BUE.
Small or Finite Sample Properties:
Minimum Mean Square Error (MSE)
^
•The MSE of an estimator,  , is defined as follows,
^
^
^
MSE() = Var() + [Bias()]2
•The minimum MSE property recognizes the
possible trade off between variance and bias.
Large Sample or Asymptotic Properties:
Asymptotic Unbiasedness
^
n ,
•An estimator,
based on a sample of size n,
is an asymptotically unbiased estimator of  if,
as n increases, the expected value of the
estimator approaches the true parameter.
^
•In other words, Bias( ) tends to zero as n
increases continually.
Large Sample or Asymptotic Properties:
Consistency
•An estimator is consistent for a true but
unknown parameter if it approaches the true
population parameter as the sample size, n,
increases indefinitely.
•In other words, in the limit, the distribution of
a consistent estimator collapses to the single
value, .
•A sufficient (but not necessary) condition for
consistency is that both bias and variance of the
estimator tend to zero as n increase, that is that
MSE tend to zero as n increases.
Properties of OLSE in the CLR Model:
Gauss-Markov Theorem
•In the CLR model, the OLS estimators are the
"best" in the sense that they are unbiased and
minimum variance (i.e., they are efficient) in the
class of all linear unbiased estimators.
•Thus, in the CLR model, the OLSE are BLUE.
•Moreover, the OLSE are consistent.
Properties of OLSE in the CLR Model:
Gauss-Markov Theorem
•The unbiasedness and consistency properties
follows from assumptions 3 and 6.
•The efficiency property follows from
assumptions 5 and 6.
Properties of OLSE in the CLNR Model:
Rao Theorem
In the CNLR model…
1.
OLSE are BUE and consistent
2.
^
(n-k)2/2 ~
3.
^
2(n-k)
ßs are distributed independently of 2
Introduction to Monte Carlo
Experiments
•Having concluded our discussion of theoretical
properties of OLS estimators in the CLR and
CNLR models, it is natural to ask…
Given that in practice we never observe the
probability distribution of an estimator, how
can we tell the OLS (or other) estimators have
such desirable statistical properties as BLUE?
Introduction to Monte Carlo
Experiments
•There are two possible answers to this question:
1. We can prove theoretically (mathematically)
that a given estimator does or does not enjoy
a certain statistical property.
2. Another way of ascertaining the sampling
distribution and other properties of an
estimator is to make use of a methodology
known as the Monte Carlo method or
experimentation.
Introduction to Monte Carlo
Experiments
•In this method, one simulates (i.e., imitates) the
“repeated sampling” concept by obtaining very
many point estimates of a parameter and using
them to examine/establish various properties for
the estimator in question.
•Let’s see how this done in the context of a
numerical example.
Monte Carlo Experiments
•In this example, we experiment with samples of
size 100, which we use to and carry out 50
experiments.
Step 1: Generate Data
a. Make up a PRF by making up values for
its parameters, 1 and 2, e.g.,
Yi = 15 + 1.5Xi + ui
Monte Carlo Experiments
Step 1: Generate Data-- Continued
b. Make up 50 sets of values for Xi (i= 1, 2,
100) which are to be kept fixed
throughout the experiment.
c. Generate 50 sets of values for the
population error term,ui, with each set
containing 100 values. Do this using a
normal distribution, random-number
generator such that each set has zero
mean and unit variance.
Monte Carlo Experiments
Step 1: Generate Data-- Continued
d. Generate 50 sets of Yi values using the
PRF from step 1.a, the set of 50 values
for Xi from step 1.b and the 50 sets of
100 values for ui from step1.c.
Monte Carlo Experiments
Step 2: Estimate the “made-up” PRF using
the generated data
•Step 1 yielded 50 samples of size 100 with the Yi
values varying from sample to sample but the Xi
values being fixed in all samples.
•Using OLS and each of the 50 samples of size 100,
estimate the two parameters of the made-up PRF.
•This yields 50 values for each of two supposedly
unknown population parameters, 1 and 2.
Monte Carlo Experiments
Step 3: Use the Estimates to Check the
Desired Result
•Use the fifty pairs of estimated values of the two
unknown parameters from step 2 to check
properties of the OLSE.
• For example, check the unbiasedness property of
the OLSE by finding the average of all 50
estimates of 2 to see if it is approximately equal to
the value you made up for it in step 1, which is 1.5.
Introduction to Monte Carlo
Experiments
for the results of the above example
see the table distributed in class
Evaluating the Estimated Model
•Once estimated, we must evaluate the sample
regression function to see how well it fits the data.
• We do this in terms of…
— signs and magnitudes of estimated
coefficients and residuals
— standard errors of OLS estimates
— coefficient of determination
Evaluating the Estimated Model:
Signs and Magnitudes of Estimates
• We use economic theory to specify the PRF.
•Theory indicates not only what independent
variable(s) to include in the model, often it also
indicates what signs we should expect for the
coefficients associated with these variables.
• If we estimate a regression equation and observe
sign reversal , i.e., a sign that goes against a priori
theoretical expectations, we should be alarmed.
• The sizes or magnitudes of the estimated
coefficients are also important.
Evaluating the Estimated Model:
Standard Errors of Estimates
•Point estimates are random variables as they
change from sample to sample.
•Thus, we must assess their precision or reliability.
• In statistics, the precision of estimates is
measured by their standard errors.
•All else the same, the smaller the standard error
of a point estimate, the more precise that estimate.
Evaluating the Estimated Model:
The Coefficient of Determination
• We need criteria for assessing the overall fit of
the sample regression line.
• In practice, because of sampling fluctuation, it is
impossible to obtain a perfect fit.
Coefficient of Determination
•In general, there will be some positive ûi and
some negative ones.
•How good the sample regression line fits the
sample data depends on how small the individual
ûi are.
•The simple sample coefficient of determination,
R2, is a measure that summarizes this information.
Coefficient of Determination
• We calculate R2 as follows. Recall that,
^
Yi = Yi +
^
ui
• Write this in deviation form,
^
^
yi = yi + ui
• Square both sides and sum over all observations,
y2
=
^
y2
+
^
u2
+
^^
2yiui
=
^
y2
+
^
ui2
Coefficient of Determination
•The above equality consists of three sums of
squares, or variations:
1. y2 which represents total variation of
actual Y values about their sample means.
We call this total sum of squares (TSS)
2.
^
y2,
which represents variation of estimated
Y values about their sample mean, known as
explained sum of squares (ESS)
Coefficient of Determination
3.
^
ui2
representing residual or unexplained
variation in Y values about the regression
line, the residual sum of squares (RSS)
•In short,

yi2
=
^2
yi
+
^ 2
ui
can be written
TSS = ESS + RSS
Coefficient of Determination
•Divide both sides of the last equality by TSS,
1 = (ESS/TSS) + (RSS/TSS) =
^ 2
yi /yi2
• Define
R2
= ESS/TSS = yi2/yi2
^
+
^2
ui /yi2
Coefficient of Determination
• R2 measures the proportion (percentage) of total
variation in Y that is explained by the regression
model as a whole.
• It has the following properties:
— It is bounded between zero and one.
— In the bivariate regression model, the
square root of R2 is equal to simple sample
correlation coefficient between Y and X,
rXY = R2 = R
Coefficient of Determination in the
EViews Output
Dependent Variable: CONS
Method: Least Squares
Date: 09/06/99 Time: 16:07
Sample: 1959 1990
Included observations: 32
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
INCOME
91.88780
0.758849
10.69410
0.005049
8.592383
150.2949
0.0000
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.998674
0.998629
14.26803
6107.298
-129.4301
0.831129
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
1653.812
385.4033
8.214381
8.305989
22588.55
0.000000