Download Economics of the Government 政 府 经 济 学

Document related concepts

Regression toward the mean wikipedia , lookup

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Least squares wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Analysis of Cross Section
and Panel Data
Yan Zhang
School of Economics, Fudan University
CCER, Fudan University
Introductory Econometrics
A Modern Approach
Yan Zhang
School of Economics, Fudan University
CCER, Fudan University
Analysis of Cross Section and Panel Data
Part 1. Regression Analysis on
Cross Sectional Data
Chap 4. Multiple Regression
Analysis:Inference
 E.,
Var., BLUE
the full sample distribution of the OLS estimators
 Assumption
6 (Normality):
Assumption 6
Assumption 3 and 5
(zero conditional mean; homogenous)
 古典假设Classical
Linear Model (CLM)
Assumptions: Assumption 1-6
The Efficiency of the OLS
estimators

The Efficiency of the OLS estimators under different
assumptions:
 Gauss-Markov Assumptions: BLUE, minimum variance
linear unbiased estimators
 CLM Assumptions: minimum variance unbiased
estimators (among all, not only linear estimators in the yi)

The Population Assumptions of the CLM:
 Assumption 6
Given x, the distribution of y is normal
 Clearly wrong e.g. narr86; prate
How?
The Normality of the OLS
estimators

Normality of u
normal sampling distributions
of the OLS estimators:

Therefore, standard normal random v.:

More results:
 any linear combination of the
is also normally
distributed, and any subset of the
has a joint normal
distribution.
Testing Hypothesis
Whether Normality of u Can Be
Assumed?


Empirical matter: transformation, log(price)
CLT(中心极限定理) : Non-normality of the errors is not a
serious problem with large sample sizes.
 Even though the yi are not from a normal distribution, the OLS
estimators are approximately normally distributed, at least in large
sample sizes. (Chap. 5. )
 仅需Gauss-Markov假设;有限方差、同方差、零条件均值

Whether normality of u can be assumed?
 And


Other Topics about Large Sample
Finite sample properties: unbiasedness; BLUE
 Large sample properties: the asymptotic properties

 Consistency
Inconsistency:
The problem does not go away by adding more
observations to the sample.
Very hard to Derive the sign and magnitude of the
inconsistency in the general k regressor case, just as to
derive the bias
say, x1 is correlated with u but the other
independent v.-s are uncorrelated with u, all of the
OLS estimators are generally inconsistent.
Other Topics about Large Sample

Large sample properties: the asymptotic properties
 Asymptotic Normality
 Even without the normality assumption (Assumption
MLR.6), t and F statistics have approximately t and F
distributions, at least in large sample sizes. (the OLS
estimators are approximately normally distributed)
 Key assumption: Homoskedasticity
If Var(y x) is not constant, the usual t statistics and

confidence intervals are invalid no matter how large
the sample size is; the central limit theorem does not
bail us out when it comes to heteroskedasticity.
is a consistent estimator of
4.2 TESTING HYPOTHESES: A
Single Population Parameter
 The
t Test:
 Null Hypothesis:
 t Statistic of :
 Significance
Level a:
the prob. of rejecting H0 when it is true
Confidence Level: 1-a
Testing Against One-Sided
Alternatives

Alternative Hypothesis:

Critical Value c and Rejection Rule:
 If
 If

, reject H0 and accept H1 at the significant level a
, reject H0 and accept H1 at the significant level a
Only need know: degree of freedom; significant level
If df>120,
the critical
value of
normal
distribution
Testing Against Two-Sided
Alternatives
Alternative Hypothesis:
 Critical Value c and Rejection Rule:

 If
, reject H0 and accept H1 at the significant level
 c is the 97.5th,(1-a/2)th percentile in the t distribution with n
k
1 degrees of freedom.
P-Values for Test

P-values:
 Given the observed value of the t statistic, what is the
smallest significance level at which the null hypothesis
would be rejected?
small p-values are evidence against the null; large pvalues provide little evidence against H0.
 once the p-value has been computed, a classical test
can be carried out at any desired level.
 one-sided p-value: just divide the two-sided p-value
by 2.

Some Guidelines

Language: “在5%的水平下不能拒绝零假设”
 “we fail to reject H0 at the x% level”



Statistical significance: the size of
Economic Significance: the size (and sign) of
大样本容量常常容易导致统计显著性;在样本容量逐渐扩大
时考虑使用越来越小的显著性水平,以弥补标准误的减小
some guidelines for discussing the economic and statistical
significance of a variable in a multiple regression model:
 统计显著性。统计显著
系数大小(注意观测单位、函数形式)
 如果不显著,变量对y是否有预期的影响?实际中影响多大?
增大p值
 t 统计量很小的变量的符号:可忽略;难以解决的真正问题
模
型和函数性质(遗漏变量;内生性;异方差)
Example 4.2: Students Performance
and School Size
One claim: everything else being equal, students at
smaller schools fare better than those at larger
schools. This hypothesis is assumed to be true even
after accounting for differences in class sizes across
schools.
 Data:MEAP93.RAW, 408 high schools in Michigan, 1993
 Variable:

 Students performance:通过MEAP的10分制数学测验的学
生百分比(math10)
 School Size:学生注册人数(enroll)
 Control factors(可能影响学生成绩的其他因素):年均教师
工资(totcomp);千名学生教师人数(staff)

Null Hypothesis:
Example 4.2: Students Performance
and School Size
 The
Estimated Equation:
 Hypothesis Tests:
df:
The t statistic on enroll:
we fail to reject H0 in favor of H1 at the 5% level.
 totcomp
is statistically significant even at the
1% significance level; staff is not statistically
significant at the 10% level.
 Function Form?
Students Performance and School Size

Function Form?

Hypothesis Tests:
Level-Log
 The t statistic on log(enroll):
 we reject

in favor of
at the 5% level
Interpretations: (Holding totcomp and staff fixed)
 Thus, if enrollment is 10% higher at a school,
is
predicted to be 1.3 percentage points lower (math10 is
measured as a percent).

Which One is Better? a higher R-squared for the levellog model (Chap 6)
4.3 Confidence Intervals
 Confidence
Intervals (Interval Estimates):
 Meaning:
将在95%的随机抽样样本中出现在置信区间
4.4 Testing Hypothesis about a single
linear combination of the parameters

E.g.
Null Hypothesis:
Meaning: Under H0, another year at a junior college and
another year at a university lead to the same ceteris paribus
percentage increase in wage.
t Statistic:

Another Way:




Testing Hypothesis about a single
linear combination of the parameters
 E.g.
Note: a/2
4.5 Testing Multiple Linear
Hypothesis: The F Test
E.g.
 Multiple Hypothesis Test (Joint Hypothesis Test ):
F Statistic:

 Unrestricted Model:
 Restricted Model:

Restriction Rule:
The R-Squared Form
of the F Statistic
F
Statistic:
 P Value:
SST
 The F Statistic for Overall Significance of a
Regression
 Testing
General Linear Restrictions
4.5 Reporting Regression Results
How to report multiple regression results for
relatively complicated empirical projects——
 Including:

 the estimated OLS coefficients (interpret the estimated
coefficients of the key variables; the units of measurement)
 the standard errors (s.e VS t statistic)
 The R-squared
 The number of observations
 Reporting the SSR and the standard error of the
regression is sometimes a good idea, but it is not crucial.

Summarized in equation form
Summarize the results in one or more tables.
Example 4.10: Salary-Pension
Tradeoff of Teachers
 Economic
Model: (Standard wage model)
 Econometric
 Key
Model
variable: b/s
 Null Hypothesis:
 Controlling v. : enroll, staff, droprate, gadrate
 Regression and Reports
 Interpretations
Example 4.10: Salary-Pension
Tradeoff of Teachers
Analysis of Cross Section and Panel Data
Part 1.
Regression Analysis on Cross Sectional Data
Chap 6. Multiple Regression
Analysis:Further Issues
6.1 Effects of Data Scaling on
OLS Statistics
 Units
of Measurement
Estimator; Statistic
standard errors, t statistics, F statistics, and
confidence intervals.
 When
variables are rescaled, the coefficients,
standard errors, confidence intervals, t
statistics, and F statistics change in ways that
preserve all measured effects and testing
outcomes.
 含log形式只改变截距,不改变斜率系数,不
改变R2,不改变检验结果
6.1 Effects of Data Scaling on
OLS Statistics
Beta Coefficients
 E.g.
test score
 Standardized: Z=(X-均值)/标准差(sd)
 Advantages:
The importance of each v.: the scale of the
regressors irrelevant, comparing the magnitudes


6.2 More on Functional Form
 Logarithmic
Functional Forms
 Models with Quadratics
 Models with Interaction Terms
6.2.1 More on Using Logarithmic
Functional Forms
Approximate
Accurate
 Small % change: not crucial
 Large change: mitigate or eliminate heteroskedastic
or skewed
 Advantages of using Ln:

 leads to coefficients with appealing interpretations
 when y>0, models using log(y) as the dependent v. often
satisfy the CLM assumptions more closely than models
using the level of y
 taking logs usually narrows the range of the v., in some
cases by a considerable amount. This makes estimates less
sensitive to outlying (or extreme) observations on the
dependent or independent v.
Some Standard Rules of Thumb for
Taking Logs

The Log is often taken when the v. being large integer values
 a positive dollar amount, wages, salaries, firm sales, and firm market
value;
 population, total number of employees, and school enrollment


Variables that are measured in years—such as education,
experience, tenure, age, and so on—usually appear in their
original form.
A variable that is a proportion or a percent—such as the
unemployment rate, the participation rate in a pension plan,
the percentage of students passing a standardized exam, the
arrest rate on reported crimes—can appear in either original
or logarithmic form, although there is a tendency to use them
in level forms.
 a percentage change and a percentage point change
More on Taking Logs

If a variable takes on zero or negative values:
 The percentage change interpretations are often closely
preserved, except for changes beginning at y=0 (where the
percentage change is not even defined)

One drawback to using a dependent v. in logarithmic
form is that it is more difficult to predict the original
v.
 it is not legitimate to compare R2 from models where y is
the dependent v. in one case and log(y) is the dependent v.
in the other.
 (see Section 6.4)
6.2.2 Models with Quadratics

Decreasing or Increasing Marginal Effects:
Quadratic Functions
 parabolic shape and U-shape

Turning point
 If this turning point is beyond all but a small percentage of
the v. in the sample, then this is not of much concern.

Other forms:
 using quadratics along with logarithms.
some care is needed in making a useful interpretation
with a dependent v. in log form and an explanatory v.
entering as a quadratic
a nonconstant elasticity: double-log and quadratics
 other polynomial terms: a cubic and even a quartic term
6.2.3 Models with Interaction Terms

Interaction Effects:
 Sometimes it is natural for the partial effect, elasticity, or
semi-elasticity of the dependent v. with respect to an
explanatory v. to depend on the magnitude of yet another
explanatory v.
Example 6.3: Effects of Attendence
on Final Exam Performance

Dependent v.: Final Exam Performance
Standardized outcome on a final exam (stndfnl)
 Be easier to interpret a student’s performance relative to the rest of
the class

Explanatory v.-s:
 Percentage of classes attended (atdrte)
 prior college grade point average (priGPA), and ACT score

Functional Form:
 Quadratics in priGPA and ACT
 class attendance might have a different effect for students who have
performed differently in the past, as measured by priGPA.
an interaction between priGPA and the atdrte.

Econometric Model
Example 6.3: Regression Results,
Inference and Interpretations
Sample
Regression Function:



Jiont Hypothesis:
What’s the Partial Effect of atdrte (priGPA) on stndfnl?
 We must plug in interesting values of priGPA to obtain the partial effect.
 The mean of priGPA :at the mean priGPA, the effect of atndrte on stndfnl is
-.0067+ .0056×(2.59)= .0078.
 Meaning: a 10 percentage point increase in atndrte increases stndfnl
by .078 standard deviations from the mean final exam score.
 Statistic Significance: New Regression
 replace priGPA×atndrte with (priGPA -2.59)×atndrte.
 gives the standard error of
, which yields
t=.0078/.0026= 3
6.3 More on Goodness-of-fit and
Selection of Regressors

R-squared and Adjusted R-squared:
 when a new independent v. is added to a regression, SSR
and (n-k-1) both decrease,
can go up or down.

increases if, and only if, the t statistic on the new v. is
greater than one in absolute value. (An extension of this is
that
increases when a group of v.-s is added to a
regression if, and only if, the F statistic for joint
significance of the new v.-s is greater than unity.)
 A negative
indicates a very poor model fit relative to
the number of degrees of freedom.
 that it is R2, not
, that appears in the F statistic
Using
to Choose Between
Nonnested Models

To choose a model without redundant independent v.:
 Different functional form (different explanatory v.)

Limitation in using
to choose between nonnested
models: we cannot use it to choose between different
functional forms for the dependent variable.
 they are fitting two separate dependent variables.
Controlling for Too Many Factors
in Regression Analysis

Overemphasize goodness-of-fit,:
regressed log(price) on log(assess),
log(lotsize), log(sqrft), and bdrms

we should always include independent v. that affect y
and are uncorrelated with all of the independent v.-s
of interest.
 Adding Regressors to Reduce the Error Variance
6.2 Prediction and Residual Analysis
 Confidence
Intervals for a Prediction from the
OLS regression line.

The estimator of
 The
Standard Error of
Run the regression:
The intercept term
 Unobserved
Error; Prediction Error
Residual Analysis
 Residual Analysis:
whether the actual value of the dependent v. is
above or below the predicted value; that is, to
examine the residuals for the individual
observations. This process is called residual
analysis.
Predicting y When log( y) Is the
Dependent Variable
The Goodness-of-fit Measure in the y
Model and the log( y) Model
is not legitimate to compare R2 from models
where y is the dependent v. in one case and
log(y) is the dependent v. in the other.
 Solution:
 it
Analysis of Cross Section and Panel Data
Part 1.
Regression Analysis on Cross Sectional Data
Chap 7. Multiple Regression Analysis
with Qualitative Information:
Binary (or Dummy) Variables
7.1 Dummy Explanatory Variables

Describing Dummy (Binary) Variables
 赋值0,1

Meaning: e.g.

:Whether there is discrimination against women? The difference in
hourly wage between females and males, given the same amount of
education (and the same error term u).
 Intercept Shift

Dummy Variable Trap:
 to keep track of which group is the base (benchmark) group.
 we will always include an overall intercept for the base group.
 Nothing changes about the mechanics of OLS or the statistical theory
when some of the independent v.-s are defined as dummy variables.
The only difference with what we have done up until now is in the
interpretation of the coefficient on the dummy variable.
Interpreting Coefficients on Dummy
Explanatory
V. When the Dependent V. Is log(y)
 Percentage
 Example:
Interpretation
7.2 The Binary Dependent Variables:
The Linear Probability Model


Dependent variable y has quantitative meaning
What Happens then?
LPM
 Interpretation of the OLS coefficients:
 In the LPM,
measures the change in the probability of success
when xj changes, holding other factors fixed:

Example: the coefficient on educ means that, everything else
in (7.29) held fixed, another year of education increases the
probability of labor force participation by .038
Limitations of the LPM

Some Shortcomings:
 predictions either less than zero or greater than one.
 Linear
constant marginal effect
unrealistic
Smaller marginal effect of subsequent children on
working probability of women
It usually works well for values of the independent
variables that are near the averages in the sample.
 Heteroskedasticy
Unbiasedness; incorrect statistic

Dummy Dependent and Explanatory Variables:
 The coefficient measures the predicted difference in
probability when the dummy v. goes from zero to one.
7.3 Using Dummy V. in Multiple
Categories: Interpretation
 Adding
a dummy v. married
Same “marriage premium” for men and women;
(0,1); (1,1); (1,0); (0,0)
 Different
“marriage premium”
Three dummy variables?
marrmale, marrfem, and singfem.
(1,0,0); (0,1,0); (0,0,1); (0,0,0)
 Ordinary Variable:
to define dummy variables for each value or each
categories of respective information
Example 7.6 (7.1,7.5) The
Determination of log Hourly Wage:


Explanatory Variables: educ, exper, tenure, marriage, gender
Dummy Variables:
 Same “marriage premium”; (0,1); (1,1); (1,0); (0,0)
 Different “marriage premium”; (1,0,0); (0,1,0); (0,0,1); (0,0,0)
 Adding Interaction Term


Regression:
Inference:
 we can use this equation to obtain the estimated difference between
any two groups.
 Unfortunately, we cannot use it for testing whether the estimated
difference between single and married women is statistically
significant. to choose one of these groups to be the base group and to
reestimate the equation.

Interpretation:
Example 7.6 (7.1,7.5) The
Determination of log Hourly Wage:
7.4 Interaction Effects Involving with
Dummy Variables

Adding Interaction Term
 The marriage premium depends on gender
the rest of the regression is necessarily identical to
(7.11).
 Equation (7.14) is just a different way of finding
wage differentials across all gender-marital status
combinations. It has no real advantages over (7.11);
in fact, equation (7.11) makes it easier to test for
differentials between any group and the base group
of single men.

Interaction Effects: Differences in
Slopes

Adding Interaction Term: Differences in Slopes
 The return of education depends on gender

Hypothesis Test:
 the return to education is the same for women and men.
 average wages are identical for men and women who have
the same levels of education: F test
7.4.2 Testing for Differences in
Regression Functions Across Groups
H0: two populations or groups follow the same
regression function, against the alternative that one
or more of the slopes differ across the groups.
 Chow Statistic:

Caution: there is no simple R2 form of the test if
separate regressions have been estimated for each
group; the R2 form of the test can be used only if
interactions have been included to create the
unrestricted model.
 One important limitation of the Chow test:
regardless of the method used to implement it, is that
the null hypothesis allows for no differences at all
between the groups.

7.5 Policy Analysis and Program
Evaluation with Dummy Variables

Policy analysis; Program evaluation
 Control group; experimental (treatment) group


be careful to include factors that might be systematically
related to the binary independent variable of interest.
Self-Selection Problems:
 The term is used generally when a binary indicator of participation
might be systematically related to unobserved factors.
 another way that an explanatory variable can be endogenous.

Solutions:
 Data
 more advanced methods
Example: the effect of the job training
grants on worker productivity
 Consider
again the Holzer et al. (1993)
study, where we are now interested in the
effect of the job training grants on
worker productivity (as opposed to
amount of job training, example 7.3).
References
Jeffrey
M. Wooldridge, Introductory
Econometrics——A Modern
Approach, Chap 4-7.