Download Chapter 4: Classical Normal Linear Regression Classical Normal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Financial economics wikipedia , lookup

Predictive analytics wikipedia , lookup

Vector generalized linear model wikipedia , lookup

Taylor's law wikipedia , lookup

Probability box wikipedia , lookup

Regression analysis wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
Basic Econometrics
Basic Econometrics
Chapter 4: Classical Normal Linear
Normal Linear Regression Regression
Model (CNLRM)
Iris Wang
[email protected]
Sampling distributions
Sampling distributions • We have studied the expected
the expected value and and
variance of the OLS estimators
• In order to
In order to do inference, we
do inference we need to know the the
full sampling distribution of the estimator
• To make this
To make this sampling distribution tractable, we
sampling distribution tractable we
now assume that the unobserved error term (u) is normally distributed in the population.
is normally
distributed in the population.
¾ This is often referred to as the normality
assumption (Assumption 10)
assumption. (Assumption
Assumption 10: Normality
10: Normality
• We continue to make the assumptions
to make the assumptions introduced
in the previous lecture (linear regression, no perfect collinearity, zero conditional mean, homoskedasticity, …
mean homoskedasticity ). )
• And we add the following:
• Assumption
A
i 10:
10 Normality
N
li – The population error
Th
l i
u is independent of the explanatory variables x1, x2,…,xk, and is normally
di
ll distributed with zero
di ib d i h
mean and variance σ2: u ~ Normal(0, σ2)
Recap: The normal distribution
Recap: The normal distribution
• The normal distribution is very
widely used in statistics & econometrics (one reason is that
normality simplifies probability
calculations)
• A normal random variable is a continuous random variable that
can take on any value. • The shape of the probability
density function (pdf) for the normal distribution is shown on the right
the right.
• The mathematical formula for the pdf is as follows:
…where:
Why are we assuming normality?
• Answer:
Answer: It implies
It implies that the OLS estimator
the OLS estimator
follows
a normal distribution too. And this makes it straightforward to do inference.
• Under the CLM assumptions (1‐7), conditional on the sample values of the independent variables,
The result that
implies that
• In words, this says that the deviation between the estimated value and the true parameter value, divided
by the standard deviation of the estimator is normally
by the standard deviation of the estimator, is normally
distributed with mean zero and variance equal to 1.
• On p.100
On p.100
• Th
The assumptions
ti
1 7
1—7 are called
ll d the classical
th l i l
linear model (CLM) assumptions.
• One immediate implication of the CLM assumptions is that, conditional
is that, conditional on the on the
explanatory variables, the dependent variable y has a normal distribution with constant
has a normal distribution with constant
variance, p.101.
How justify the normality
the normality assumption?
• Central
Central limit theorem
limit theorem (CLT): the residual
(CLT): the residual u is the sum
is the sum of of
many different factors; and by the CLT the sum of many
random variables is normallyy distributed
• This argument is not without weaknesses (e.g. doesn’t
hold if u is not additive).
)
• Whether normality holds in a particular application is an empirical
p
matter – which can be investigated
g
• Sometimes using a transformation – e.g. taking the log – yyields a distribution that is closer to normal.
Example: CEO Salary
l
and Return
d
on Equity
• Data:
Data: CEOSAL1.SAV (available
CEOSAL1 SAV (available on course
on course
website)
• Salaries expressed in thousands
e pressed in tho sands of USD.
of USD
• It would be interesting to do the sample
distributions of salary in the different scales.
Sample distributions of CEO salaries in l l & logs
levels
&l
Basic Econometrics
Basic Econometrics
Chapter 5: Interval Estimation and Hypothesis
Interval Estimation
and Hypothesis
Testing
Iris Wang
[email protected]
Confidence intervals
• Once we have estimated the population parameter β
p p
p
β
and obtained the associated standard error, we can
easily construct a confidence interval (CI) for βj.
•
has a t distribution with n‐k‐1 degrees
of freedom (df).
• Define a 95% confidence interval for βj as where the constant t0.025 is the 97.5th percentile in the t distribution.
Confidence intervals
(lower limit)
(upper limit)
Meaning of CI: in 95 out of 100 cases intervals like Eq. above, will
b
ll contain the true
h
βJ. Confidence intervals
Confidence intervals
The width of the CI is proportional to the standard error of the estimator. • the larger the Se, the larger is the width of the CI.
• the larger the , the greater is the uncertainty of estimating the true of the unknown parameter.
How is the confidence
is the confidence interval affected
interval affected by an increase
by an increase in the level
in the level
of confidence (e.g. from 95% to 99%)? Why? Don’tt forget the CLM assumptions!
Don
the CLM assumptions!
• Estimates
Estimates of the confidence
of the confidence interval will
interval will not be not be
reliable if the CLM assumptions do not hold. Example:
• Data: wage1.sav
g
• These data were originally obtained from the 1976 Current Population Survey in the US. • SPSS output:
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
educ
Std. Error
-,892
,686
,541
,053
95,0% Confidence Interval for B
Coefficients
Beta
t
,405
Sig.
Lower Bound
Upper Bound
-1,300
,194
-2,239
,456
10,143
,000
,436
,645
a. Dependent Variable: wage
• Can you calculate these two CIs by yourself according
to the formula?
to the formula?
Hypothesis Testing Testing
• In Chapter
In Chapter 3 we
3 we learned that Assumptions
that Assumptions 1
1‐7
7 (such as, linear regression, no perfect collinearity, zero conditional
enable us to obtain
mathematical formulas for the expected value
and variance of the OLS estimators
• To test a hypothesis, we need to know the full sampling distribution of the estimator
sampling distribution of the estimator
mean, homoskedasticity)
1. Sampling Distribution: Illustration
1. Sampling Distribution: Illustration
• Suppose we
pp
want to make statements about a population consisting of (say) 10 million individuals.
• The model is as follows: Y = β0 + β1*x + u
• Suppose we could draw (say) 100 samples from this population, where each sample consists of (say) 200 observations. Further suppose we
observations. Further
suppose we would estimate 100 100
different regressions (one for each sample)
• This would generate 100 different estimates of our
parameter of interest β1 – and they would form the distribution of our estimator.
Let’ss do this!
Let
• Let’s
Let s simulate 500 samples
500 samples consisting of 200 of 200
individuals. Our model is Y = β0 + β1*x + u, where u is normally
u is normally distributed (and all other
(and all other
assumptions hold too).
• Since we are simulating
are simulating data, we
data we can now
choose the true parameters (this would
obviously not be the case
not be the case for real empirical
for real empirical
applications). Let’s choose β0 = 0 and β1=0. 0
2
Density
4
6
Here’s the distribution of the 500 d ff
different estimates
of β
f β1
-.2
-.1
0
b1
Mean of b1: ‐0.005
Std dev of b1: 0.072
.1
.2
2. Why do we need to know the sampling
distribution of the OLS estimator?
• Recall the formula for the t statistic:
• In other words, the difference between the parameter estimate and a given (unknown) value of the true parameter, scaled by the standard error of the estimator, follows a t di t ib ti
t‐distribution.
• This is very good news, because we know exactly what the t distribution looks like (statisticians have studied this distribution looks like (statisticians
this
distribution for many years).
p
,
know exactlyy how to compute
p
probabilities
p
• In particular, we
using a t distribution – and this will be very useful when testing hypothesis (more on this shortly)
• Here’s the answer to the question – if we don’t
know the sampling distribution of the OLS p g
estimator, we can’t be sure that (beta_hat –
beta)/se(beta
)/ (
_hat) follows
)
a t‐distribution. • In that case, this quantity could follow any
distribution – in which
distribution –
in which case there
there’ss no way
no way of of
doing the probability analysis that underlies the hypothesis testing. testing
Testing the null
the null hypothesis
• In most
In most applications, testing
applications, testing
is of central interest (j corresponds to any of the k independent variables in the model).
• Since βj measures the partial effect of xj on the expected value y after controlling for other factors, th
the null
ll hypothesis
h th i means that x
th t j has no effect
h
ff t on the expected value of y. Example: Wage equation
Example: Wage
log(wage)=β0 +β1 education+u
• The null
h
ll hypothesis
h
h i H0: β
β1=0 means that, education
h
d
i
has no effect on hourly wage. • Is this
I thi an economically
i ll interesting
i t
ti hypothesis?
h th i ?
• Now let’s look at how we can carry out and interpret such a test.
a test
• The test statistic we use to test is called
the t statistic or the t ratio of and is defined
the t statistic or the t ratio
of
and is defined as as
• As you can see, the t statistic is easy to compute: just divide yyour coefficient estimate by the standard y
error.
• SPSS (and most other econometrics software) will do this for you.
• Since the se is always positive, the t statistic always
has the same sign as the coefficient estimate.
Intuition
Two‐tailed
Two
tailed tests
• Consider a null hypothesis like H0: βj=0 against a two‐
sided
id d alternative like H
lt
ti lik H1: β
βj≠0.
≠0
• In words, H1 is that xj has a ceteris paribus effect on y which could be either
y, which
be either positive or negative.
positive or negative
Now let
let’ss decide on a significance
on a significance level
Significance level = probability of rejecting H0 when it is in fact true (i.e. a mistake).
Let’s decide on a 5% significance level (the most common
choice): hence, we are willing to mistakenly reject H0
when it is true 5% of the time.
Two‐sided (cont’d) (
)
• To find the critical vale of t (denote by c), we first specify the significance
level, say 5%. • Since the test is two
the test is two‐
tailed, c is then chosen to
make the area in each tail
equal 2.5% 2.5% ‐ i.e. c
i.e. c is the is the
97.5th percentile in the t distribution (again, with n‐
kk‐1
1 degrees
degrees of freedom).
of freedom).
• The graph shows that, if
df=26, then c=2.06.
Econometric jargon: If H0: ßj=0 is rejected against a two‐sided alternative, we may
say that ”xj is statistically significant at the 5% level”. Thus we conclude that the effect of xj on y is not zero.
Testing against one
one‐sided
sided alternatives
• The rule
The rule for rejecting
for rejecting H0 depends on:
1. The alternative hypothesis (H1)
2. The chosen significance level of the test
• Let’s begin by looking at a one‐sided alternative of the form:
• Let’s assume we decide to apply a 5% significance
level, that is, α=5%.
One‐tail
One
tail test
• Under
Under H
H0 (βj=0
0 ), the t statistic has a t distribution.
), the t statistic has a t distribution.
• Under H1 (βj>0), the expected value of the t statistic is positive. p
• Denote the critical value by c.
On p.118
• On p.118
Rejection rule: H0 is rejected in favor of H1 at the 5% significance level if
We’ve seen how to obtain the t statistic. But how do we obtain c?
To obtain c, we only need the significance level
and the degrees of freedom (df). Example: For df = 28 and significance level 5%, c=1.701
If our
If
our t statistic is less than
t statistic is less than 1.701, we
1 701 we do not do not
reject H0
But if our t statistic is higher than 1.701, we do reject H0
A few points worth noting
A few
• As
As the significance
the significance level falls, the critical
falls, the critical value
increases. Why?
j
at (say) the 5% level, it is ( y)
,
• If H0 is rejected
automatically rejected at the 1% level too.
• What is the critical value c for o A 10% significance level with df=21?
o A 1% significance level with df=120?
• Confirm that, as the df gets large, the critical values
for the t‐distribution get very close to the critical
values
l
f h
for the standard normal distribution.
d d
l di ib i
Example: The wage equation
(
(Data: WAGE1.SAV)
)
Model: Based on the results below, test H0: β1=0 against H1: β1>0 Coefficientsa
Standardized
Unstandardized Coefficients
Model
1
B
(Constant)
educ
a. Dependent Variable: wage
Coefficients
Std. Error
Beta
- 892
-,892
,686
686
,541
,053
t
,405
Sig.
-1 300
-1,300
,194
194
10,143
,000
Testing other hypotheses about βj
• Although H0: βj=0 is the most common hypothesis, we
sometimes want to test whether βjj is equal to some
other given constant. Suppose the null hypothesis is
• In this case the appropriate t statistic is:
• Now go back and test the hypothesis that the educ
coefficient in the regression above is equal to 1 (against a two sided alternative). two‐sided
alternative)
Computing p
p‐values
values for t tests
for t tests
• You have seen how the researcher chooses the significance level. There’s no ”correct” significance
level.
• In practice, the 5% level is the most common one, but
10% is also frequently used (especially for small datasets) as is 1% (more common for large
datasets) as is 1% (more
common for large datasets).
datasets)
• Given the observed value of the t statistic, what is the smallest significance level at which
at which the null
the null hypothesis
would be rejected?
is known as the p‐value.
as the p‐value
• This level is known
• Example: Suppose t = 1.85 and df=40. • This results in a p‐value = 0.0718.
p‐values
p
values in SPSS
in SPSS
•
•
Correct interpretation: The p‐value is the probability of observing a t value
as extreme as we did if the null
hypothesis is true. ☺
Wrong interpretation (not interpretation (not
uncommon): ”The p‐value is the probability that the null hypothesis is true….”. Thus, small p‐values are evidence
against the null hypothesis. If the p‐
value is, say, 0.04, we might say
there’ss significance at the 5% level
there
at the 5% level
(actually at the 4% level) but not at the 1% level (or 3% or 2% level).
Coefficientsa
Standardize
Model
1
Unstandardized
d
Coefficients
Coefficients
B
(Constant)
educ
a. Dependent Variable: wage
Std. Error
-,892
,686
,541
,053
Beta
,405
t
Sig.
-1,300
,194
10,143
,000
Basic Econometrics
Basic Econometrics
Chapter 6: Extensions of the Two‐Variable
Extensions of the Two
Variable
Linear Regression Model
Iris wang
[email protected]
Log‐linear
Log
linear regression models
regression models
• In
In many cases relationships between many cases relationships between
economic variables may be non‐linear.
• However we can distinguish between However we can distinguish between
functional forms that are intrinsically non‐
linear and those that can be transformed
and those that can be transformed into into
an equation to which we can apply ordinary least squares techniques
least squares techniques.
Log‐linear
Log
linear regression models
regression models
• Of
Of those non
those non‐linear
linear equations that can be equations that can be
transformed, the best known is the multiplicative power function form
multiplicative power function form (sometimes called the Cobb‐Douglas functional form) which is transformed into a
functional form), which is transformed into a linear format by taking logarithms.
Log‐linear
Log
linear regression models
regression models
Production functions Production
functions
For example, suppose we have cross‐section data on firms in a particular industry with
data on firms in a particular industry with observations both on the output (Q) of each firm and on the inputs of labour (L) and capital firm and on the inputs of labour
(L) and capital
(K). C id h f ll i f
Consider the following functional form i
lf
Log‐linear
Log
linear regression models
regression models
Log‐linear
Log
linear regression models
regression models
Log‐linear
Log
linear regression models
regression models
The parameters α and β can be estimated directly from a regression of the variable lnQ on lnL and lnK
Log‐linear
Log
linear regression models
regression models
Log‐linear
Log
linear regression models
regression models