Download Lecture note

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Omnibus test wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Econometrics in Practice
Donggyu Sul
2015
Abstract
1
1
Introduction: Types of Data
There are three types of data: Cross sectional data, Time series data, Panel data.
1.0.1
Cross sectional data (i = 1; :::; n)
- True cross sectional data: Time invariant. Example; personality, preference, I.Q., blood
type.
- Pseudo cross sectional data: Time varying. Example; personal income, real estate
prices,
1.0.2
Time series data (t = 1; ::; T )
- Stationary data: The variance is not time varying. Example; interest rate, Texas state
income growth rates
- Nonstationary data: The variance is time varying, and sometimes, the variance is
increasing over time. Example; US income level, stock prices, exchange rates.
1.0.3
Panel data
Combining with cross and time.
- short panel: large n but small T
- long panel: small n but large T
Depending on a type of the data, the statistics of interest are in general di¤erent.
2
Part I
Cross Section Data
Parameters of interest: central tendency and shape of distribution.
Learn the di¤erence between the true and pseudo cross sectional data
Learn how to compare two samples. (independent and dependent)
Learn how to model a linear regression
Learn the role of missing variables on the regression results
3
2
Central Tendency
Mean, Median, Quantile Mean.
Example: Female and male income comparison.
2.1
Mean: Statistics
Sample mean:
1X
^=
yi
n i=1
n
Weighted mean:
^ we =
n
X
! i yi ;
where
i=1
Trimmed mean: Discard a small
calculating the mean. Let y1
y2
^ trim
ym
n:
! i = 1:
i=1
percentage of the largest and smallest values before
yn : Then
n
m
X
1
yi ;
=
n 2m i=m+1
Winsorized mean: Set y1 = ym+1 ;
n
X
= m=n:
; ym = ym+1 and yn = ym
n;
yn
1
= ym
n;
; ym
n+1
Then take the mean.
Example: n = 50:
= 2%: The 2% winsorized mean is
original sample: from smallest to largest y1 ; y2 ; y3 ; y4 ; y5 ;
Trimmed mean 2%:
0; y2 ; y3 ; y4 ; y5 ;
Winsorized mean 2%
y 2 ; y2 ; y3 ; y4 ; y5 ;
; y46 ; y47 ; y48 ; y49 ; y50
; y46 ; y47 ; y48 ; y49 ; 0
; y46 ; y47 ; y48 ; y49 ; y49
Googling: Trimmed Mean PCE In‡ation Rate http://www.dallasfed.org/research/pce/
2.2
Mean: Statistical Inference
How to evaluate whether or not the sample mean is meaningful. Or test if the estimated
sample mean is di¤erent from a referenced value (for example, zero or unity)
Basic idea: Use central limit theorem which is a nice statistical device.
4
=
Central limit theorem (CLT)
P
1 n
yi : As n ! 1; (^ n
) !d N
Let ^ n =
n i=1
2.2.1
2
0;
n
:
Interpretation:
1. !d the distribution (d) approaches to
2. N (0; v) : Normal distribution with the variance of v:
3. Normal distribution: probability density function or Pr(x = c) :
(a) Example: (Probability of x = 0)
1
p exp
2
(b) If
= 0 and
(c) If
= 0 and
4. Why
"
)2
(x
2
2
#
2
= 1 (standard normal distribution), then Pr(x = 0) becomes
#
"
1
1
(0 0)2
1
p exp
= p exp (0) = p = 0:39894
2
2
2
2
2
= 0:5; then Pr(x = 0) becomes
"
#
(0 0)2
1
1
1
p exp
p exp (0) = p
p = 0:5642
p
=p
2 0:5
0:5
2
0:5
2
0:5
2
2
=n ? As n ! 1; the variance goes to zero. As the variance approaches to
zero, the probability of (^ = ) becomes unity.
5.
: unknown mean.
6. ^ n : the sample mean with the sample size of n:
7. n ! 1 : as the sample size becomes in…nitely large.
2.2.2
Statistical Testing
Central limit theorem (CLT): If yi is i.i.d.
p
n (^ n
) d
q
As n ! 1; t ^ =
! N (0; 1)
2
^
5
What is i.i.d?
independently identical distribution.
independent: yi is not correlated with yj
identical: The variance of yi is the same as that of yj
T-statistics
1. Calculate the sample mean ^ n
2. Calculate the sample variance ^ 2 =
Why (n
1
n
1
Pn
i=1
(yi
^ n )2
1) rather than n? You will learn (sure?) it later.
Want to know now? Do the following.
(a)
(b)
(c)
(d)
!
!
2
2
Pn 2
1 Pn
1 Pn
yi
yi
=E
Show that E
i=1 yi
i=1 yi
n i=1
n i=1
P
P
Assume that Eyi2 = 2 : Then E ni=1 yi2 = ni=1 Eyi2 =?
2
1 Pn
1 Pn 2 1 Pn Pn
Show that
y +
y
yi yj :
=
i
n i=1
n2 i=1 i n i=1 i6=j
2
1 Pn
y
Show that E
= 2 =n if Eyi yj = 0 for i 6= j:
i
n i=1
Pn
3. Calculate the t-statistics
4. Compare the critical value. (?) what is this?
Testing (Size of the Test)
Plot a bell shape standard normal density here
5% and 95% values: -1.65, 1.65 => Meaning: if a t-value is inside this range, ^ is the
same as :
– Any mistake? Yes, we allow 5% mistakes (both sides) => total 10%. Too large?
2.5% and 97.5%: -1.96, 1.96
6
– 5% total mistake. Still huge? Think about this. You are a MD. Mistake you can’t
…nd a cancer. 5%: Huge. Actual: 50%?
– What if you are a judge? Is -1.96 a good number to reject that ^ is ?
We call 5% or 2.5% as the size of the test: Probability of the rejection when the null
is true.
In Physics, the mistake rate should be less than 0.0001%. In Statistics, usually 5%.
What if yi is not independent?
No solution in cross sectional data if we don’t know how to order yi
Spatial analysis: order yi by using location. And assume that the dependence increases
as yi is near by yj
What else? Nothing much.
What if yi is not identical
Basically, it is okay. De…ne
Statistically ^ 2 !
2
2
= limn!1
:
Standard Deviation and Standard Error
r
q
2
Standard deviation (SD) ^ = ^ =
1 Pn
n i=1
1
2
i:
Pn
i=1
(yi
n 1
p
Standard error (SE) s^ = ^ = n: So usually t ^ = ^ =^
s if
How to estimate the variance of weighted Mean
P
^ 2 ;we = ni=1 ! 2i (yi ^ we )2 or alternatively,
!
Pn
2
Pn
!
i
^ 2 ;we = 1
^ we )2 :
Pni=1 2
i=1 ! i (yi
( i=1 ! i )
CLT works with ^ we and ^
;we
also.
7
^ n )2
= 0:
2.3
Median
The median is the number separating the higher half of a data sample from the lower half.
When median is used (Example)
house price. Why?
household income. Why not mean?
Median is a better measure of the central tendency when a distribution is asymmetric or
skewed.
Example: {1,2,2,3,4,5,20}. Sample mean: 1 + 2 + 2 + 3 + 4 + 5 + 20 = 37:0. 37=7 = 5:2857:
Median: 3.
Is median e¢ cient than mean?
What is e¢ cient? => smaller variance.
Answer is "No" in general.
2.4
Quantile Mean
Calculate 25% of the quantile and 75% of the quantile. And then take their average.
Example: {1,2,3,4,5,6,7,8,9,100,200}.
25% quantile = 3, 75% quantile = 9. (3+9)/2 = 6. Median = 6. Mean = ?
8
3
Distribution
Learn how to plot a probability density function (pdf) or distribution, and cumulative density
function.
Learn how to interpret a pdf and how to use a cdf.
Basic: Histogram
1. Select bandwidth or interval size. Smaller interval size makes more fuzziness. Large
interval size makes too roughness.
2. Count the total number of observations for each interval.
3. Plot histogram.
Advance: Kernel Estimation
Nonparametric estimation (based on parametric assumption).
The formula is given by
1 X
f (x) =
K
nh i=1
n
x
xi
h
Basically, one assume a particular kernel function (normal, uniform, epanechnikov for
example), and for each point, one calculate its pseudo density, and then sum up all
densities.
Among many kernel function, epanechnikov kernel is known as the most e¢ cient one.
Do you need to plot one? Use histogram.
In many cases, you don’t need to plot a pdf by yourself. The most important matter is
whether or not the distribution is skewed. If a distribution becomes really skewed, then the
better central tendency becomes median or quantile mean rather than the sample mean.
Next, we will learn how to plot an empirical cumulative function.
9
Empirical Cumulative Density Function
1. Sort from the smallest to the largest.
2. Set the smallest as 1=n; the second smallest as 2=n; :::; and the largest as n=n:
3. Plot the data over the sequence (1=n; 2=n;
; 1):
How to use Empirical CDF
Testing whether or not observed samples follow a particular distribution. We call one
sample distribution test.
– Kolmogorov-Smirnov (KS) test
– Anderson-Darling test
– Watson test
– Cramer-von Mises test etc..
Suppose that yi
N( ;
2
) ; then you can plot the theoretical cdf. Compare the
theoretical cdf with the empirical one. And then test whether or not yi is actually
distributed as a normal.
Also you can test whether or not two samples share the same distribution. We call this
two-sample distribution test. Still you can use KS test by comparing the maximum
di¤erence between two empirical cdfs.
10
3.1
Exercise 1
Use PCE_Inf_90item.xls. Check the assignment sheet. Find which years you need to use
for this exercise.
1. Estimate the following statistics for each year: sample mean, median, quantile mean,
5% trimmed mean (low 5 and high 5), 5% winsorized mean.
2. Use the sample mean and sample variance: Test whether or not the mean of the
in‡ation rate is equal to 2% for each year.
3. Plot historgram for each year.
4. Plot distribution for each year by using at least two kernels.
5. Plot empirical CDF.
6. Test whether or not the observed series are normally distributed. (KS test)
7. Test whether or not the …rst year sample shares the same distribution with the second
year sample. Do the same test with the second and third year.
11
4
Two-Sample Comparison
Several tests available. But the point of interest is how they are di¤erent each other.
Examples: (Medical) experimental studies
Two groups: Treated v.s. Controlled. Number of subjects: small usually. (sometimes
less than 20)
Comparison: Treated sample is di¤erent from the controlled sample (how di¤er?) =>
Plot distributions if n is large enough.
Location (central tendency) or distribution:
Key assumption: Two samples are not correlated. In other words, researchers should
collect independent samples. If they can’t, they have to control the dependence. How?
we will study soon.
4.1
Nonparametric Tests (Exact Sample Theory)
Student’s t-test requires that a sample is distributed as a normal. When n is small and …xed,
we need a normal distribution assumption. Otherwise, we don’t know the critical value of
the t-statistic. Of course as n is large, we can use CLT. Nonparametric tests do not require
any distributional assumption.
Equal distribution test: Kolmogorov-Smirnov (KS) test. Compare the empirical CDF.
– KS = max jCDF (x1 )
CDF (x2 )j where CDF (xj ) is the empirical CDF of the
sample xj :
– As n ! 1; KS has Kolmogorov distribution.
– Two samples should be independent.
Rank test: Wilcoxon-Mann-Whitney Test. Comparing ranks of the two samples.
– Can be used for paired or unpaired samples.
– Combine two samples and then rank them from the smallest to the largest.
12
– If two samples share the same distribution, then both rank sum must be same.
– However it does not mean, the equal rank-sum stands for the equal distribution.
– Popularly used in practice. Wrongly interpreted (as the equal distribution)
– Two samples should be independent.
Summary: If two samples are not independent, should not use the above tests.
So if the sample size is small (less than 20) but two samples are dependent, just toss
a coin.
4.2
4.2.1
T-Test or Moments Test
Sample Moments:
1st
^
2nd ^ 2
3rd ^ 3
4.2.2
Sample Central Moments:
1st
^
2nd central ^
3rd dental
1 Pn
yi
n i=1
1 Pn 2
y
n i=1 i
1 Pn 3
y
n i=1 i
2
S^
1 Pn
yi
n i=1
1 Pn
yi
n i=1
1 Pn
yi
n i=1
1 Pn
yi
n i=1
1 Pn
yi
n i=1
2
: variance
3
: skewness
1. Variance: Risk, volatility, etc. Large variance => more risk
2. skewness: Symmetric distribution => zero skewness.
4.2.3
Method of Moments
Set a null hypothesis:
H0 : m x = m y
13
where mx and my are moments. De…ne M as M = mx
my : Further let VM be the variance
of M: Then from CLT, we have
tM
p
nM d
! N (0; 1) :
= p
VM
Example 1: Mean Let
1X
^y =
yi
n i=1
1X
xi :
n i=1
n
M = ^x
V (M ) = V (^ x ) + V ^ y
n
2Cov ^ x ; ^ y ;
where
V ^y =
1
n
1
n
X
1X
yi
n i=1
n
yi
i=1
Cov =
1
n
1
n
X
!2
; V (^ x ) =
1X
yi
n i=1
n
yi
i=1
!
1
n
1
n
X
n
xi
i=1
!
n
1X
xi :
n i=1
xi
1X
xi
n i=1
!2
Example 2: Second Moment Let
1X 2
y
n i=1 i
n
M = ^ x2
V (^ x2 ) =
1
n
1
n
X
1X 2
y
n i=1 i
n
yi2
i=1
Cov =
^ y2 =
1
n
1
n
X
i=1
yi2
!2
1X 2
x:
n i=1 i
n
; V ^ y2 =
1
n
n
X
i=1
yi2
!
1
n
1
n
X
i=1
1X 2
x
n i=1 i
n
x2i
1X 2
x
n i=1 i
n
x2i
!
!2
:
Summary: T-test or the method of moments is free from the independence assumption.
But this method requires a large sample. How large? n should be larger than at least 20.
14
4.3
Dummy Variable Approach
De…ne a dummy such that
di =
(
1 if i 2 treated group
0 if i 2 controlled group
:
Run the following regression.
yi =
+ di + u i
where yi includes all treated and controlled observations.
We didn’t learn how to run a linear regression. So we will study the statistical properties
of the linear regression in the next section.
15
5
5.1
Ordinary Least Squares (OLS) with Single Regressor
What is OLS?
Regression Model
yi =
+ x i + ui
yi : regressand
xi : regressor
ui : error (not residual). Mean is zero.
: constant or intercept
: slope coe¢ cient
How to estimate
and
:
^=
Pn
i=1
:
1 Pn
1 Pn
yi
yi
i=1 xi
n
n i=1
2
Pn
1 Pn
xi
i=1 xi
n i=1
xi
1X
^=
yi
n i=1
n
Alternatively
arg min
;
n
X
u2i
= arg min
;
i=1
^1
n
n
X
n
X
(2)
xi
i=1
(yi
xi )2
i=1
– Least squares: literary minimize the sum of the square errors.
– The solution of the least square in (3) becomes (1) and (2).
16
(1)
(3)
How to test
or
Need to calculate the variance of : Note that by de…nition,
V ^ =E ^
2
:
Observe this.
^ =
=
=
=
=
=
Hence
1 Pn
1 Pn
yi
yi
i=1 xi
n
n i=1
2
Pn
1 Pn
xi
i=1 xi
n i=1
Pn
1 Pn
1 Pn
+ x i + ui
[ + x i + ui ]
i=1 xi
i=1 xi
n
n i=1
2
Pn
1 Pn
x
x
i
i
i=1
n i=1
Pn
1 Pn
1 Pn
1 Pn
x
x
x
+
ui
+
x
+
u
+
i
i
i
i
i
i=1
n i=1
n i=1
n i=1
2
Pn
1 Pn
x
x
i
i
i=1
n i=1
Pn
1 Pn
1 Pn
1 Pn
ui
x i + ui
i=1 xi
i=1 xi
i=1 xi
n
n
n i=1
2
Pn
1 Pn
xi
i=1 xi
n i=1
Pn
Pn
1 Pn
1 Pn
1 Pn
xi
xi
ui
i=1 xi
i=1 xi
i=1 xi +
i=1 xi
n
n
n i=1
2
Pn
1 Pn
xi
i=1 xi
n i=1
Pn
1 Pn
1 Pn
x
x
u
ui
i
i
i
i=1
n i=1
n i=1
+
:
2
Pn
1 Pn
xi
i=1 xi
n i=1
Pn
i=1
xi
^
Pn
i=1
=
1 Pn
1 Pn
ui
ui
i=1 xi
n
n i=1
2
Pn
1 Pn
x
x
i
i
i=1
n i=1
xi
17
1 Pn
ui
n i=1
Next,
2
^
6
6
=6
4
Pn
It can be simpli…ed as
6
6
= E6
4
1 Pn
1 Pn
ui
ui
xi
i=1 xi
n
n i=1
2
Pn
1 Pn
xi
i=1 xi
n i=1
Pn
i=1
Pn
'
Pn
xi
i=1
Then
p
t^ = q
1 Pn
ui
n i=1
1 Pn
xi
n i=1
ui
i=1
2
E ^
1 Pn
1 Pn
xi
ui
ui
i=1 xi
n
n i=1
2
Pn
1 Pn
xi
i=1 xi
n i=1
i=1
2
2
E ^
2
n ^
(n
1)
^ 2u =^ 2x
32
7
7
7
5
32
7
7
7?
5
2
2
= (n
1) ^ 2u =^ 2x
!d N (0; 1)
Complicated, isn’t it? So what do you have to remember?
^ = (X 0 X)
– What is ?
=
"
2
#
1
X 0y
:
3
2
1 x1
y1
6 . . 7
6 .
6 .
. . 7
– what is X? X = 6
4 . . 5; y = 4 .
1 xn
yn
3
7
7:
5
– (X 0 X) =2 2 matrix.
" P
# "
#
Pn
Pn
n
1
x
n
x
i
i
i=1
i=1
X 0 X = Pni=1
Pn 2 = Pn
Pn 2
i=1 xi
i=1 xi
i=1 xi
i=1 xi
– Next,
1
V (^ ) = ^ 2u (X 0 X) : 2 2 matrix,
2
3
^
V (^ )
Cov ^ ;
5:
V (^ ) = 4
Cov ^ ; ^
V ^
18
and
^ 2u
=
1
n
1
n
X
u^2i ;
u^i = yi
^ xi :
^
i=1
– u^i is called ‘residual’.
– The t-statistics are
t^ = p
5.2
^
V (^ )
; t^ = r
^
^
V
Dummy Variable Regression
De…ne
di =
(
1 if i 2 treated group
0 if i 2 controlled group
:
The dummy variable regression is given by
yi =
+ di + ui :
Economic Meaning Let n be the total number of treated samples. Similarly nC be the
total number of controlled samples. Then we have
1X
yi = ^ =
n i2
Hence if
+
+
1X
yi = ^ C =
n i2C
+
1X
ui =
n i2
1X
ui =
n i2C
+ :
= 0; then there is no mean di¤erence between the two samples.
19
6
Problem Set 2
Use PCE_Inf_90item.xls. Choose the …rst two years. Check the assignment sheet. Find
which years you need to use for this exercise.
1. Plot two density functions jointly. (in one graph: Use red and black color)
2. Test whether or not the two samples share the same distribution. (KS test)
3. Test whether or not the two samples share the same mean.
4. Test whether or not the two samples share the same variance.
5. Run the dummy variable regression. Test whether or not the two samples share the
same mean.
6. Choose the …rst and third years. Repeat 1 through 5.
20
7
Case Study: Gender Income Gap
Data: Annual income survey. Pseudo cross sectional data
Finding: Male income has been higher than female income.
Source: BLS
The …rst …gure shows that the female earning is catching up with the male earning.
1. Two samples are correlated each other.
2. If you …x a year, and collect two cross sectional data, then two samples contain some
other informations. What are they?
3. What about future?
21
Source: For Gender Inequality Deniers, Here’s the Contrary Evidence
April 10, 2014 by Michael Morrison
The next …gure shows the two density functions jointly.
1. Looks like that two samples are somewhat di¤erent. Is it true?
2. Can you use the KS test to test the equal distribution? If not, why?
The next …gure shows that males’ SAT scores are relatively higher than female’s SAT
scores. Does it imply that the females’earning is lower than the males’earning in general?
22
SAT score distribution
1. If this claim is true, the historical SAT score di¤erence should be similar to the pattern
of the earning di¤erence. See the next …gure.
Historical SAT scores
2. Obviously, the di¤erence between the male and the female SAT scores has not been
changed much.
Conclusion
Females and males’earning di¤erence is getting smaller.
Can’t test the equal distribution because the KS test fails if the two samples are
dependent.
23
The other variable such as SAT score is not helping much to explain the gender earning
di¤erence.
There must be other reasons? What are they?
24
8
Ordinary Least Squares (OLS) with Multiple Regressors
Purpose of this section
learn the role of the missing variables.
learn how to interpret a …gure using two variables
8.1
Basic Statistic Methods
Consider the following two independent variables regression.
yi =
+
1 xi
+
2 zi
+ ui
(4)
Then
^ ols = (X 0 X)
where
1
X 0y
2
The variance of ^ is given by
3
1 x1 z1
6 . .
.. 7
.. ..
X=6
. 7
4
5:
1 xn zn
^ 2 = ^ 2u (X 0 X)
Let
(5)
2
i;j
1
:
is the element in ith column and the jth row of ^ 2 . Then the t-statistics are given
by
8.2
^
^
t ^ 1 = p 12 ; t ^ 2 = p 22 :
^ 22
^ 33
Role of the second variable
Consider the following two regressions.
yi =
1
+
1 xi
+ ei ;
(6)
yi =
2
+
2 zi
+ vi :
(7)
25
The regression in (6): you are interested in the relationship between yi and xi : The regression
in (7): you are interested in that between yi and zi : Suppose that you have
^ = 1; ^ =
2
1
1:
If and only if xi is not correlated with zi ; then
^ 1 in (4)
^ in (6),
1
^ in (4)
2
^ in (7).
2
Why? From OLS estimator formula in (5), we have
P
P
P 2P
x~i y~i
x~i z~i z~i y~i
z~i
^1 =
;
P 2P 2
P
z~i
x~i ( x~i z~i )2
P 2P
P
P
x
~
z
~
y
~
x
~
z
~
x~i y~i
i
i
i
i
i
^2 =
;
P
P 2P 2
2
z~i
x~i ( x~i z~i )
P
P
P
where x~i = xi n 1 xi ; zi = zi n 1 zi ; and yi = yi n 1 yi : That is, ‘~’ stands
for the deviation from its mean. Next, if x~i is not correlated with z~i ; then the sample cross
variance must be near to zero.
1X
x~i z~i
n
Then we have
^ =
1
P
z~i2
P
P
x~i y~i
P 2
2
z~i
x~i
0;
()
X
x~i z~i
0:
P 2P
P
P
P
x~i y~i
z~i
x~i z~i z~i y~i
x~i y~i
= P 2 P 2 = P 2 = ^ 1:
P
2
z~i
x~i
x~i
( x~i z~i )
Usually two independent variables (or regressors) are correlated each other. In other
words, xi is correlated with zi : Then in this case,
^ 1 6= ^ :
1
So what does it mean? How can we interpret ^ 1 then?
If xi is exogenous, then the inclusion of additional variables does not alter the regression result
Exogenous? What is this?
E (xi ej ) = 0 for all i and j in (6): Then xi is exogenous to ei .
26
If xi is exogenous, then E (xi vj ) = 0 in (7)? => No. xi is exogenous only to ei :
How do we know xi is exogenous?
– It is not easy to test because ei is unknown.
– So called ‘Hausman test’is used but this test is valid under very restrictive condition. We will study this later
Then what should we do?
– Including many variables. Check whether or not ^ 1 changes.
– We call them ‘control variables’.
Graphical Approach Consider the following case.
y = a + z + u : True relationship
z = x + v : correlated variable
+ x + e : x is not exogenous.
y =
4
2
3
1.5
2
1
1
0.5
0
0
-1
-0.5
-2
-1
-3
-1.5
-4
-5
-4
-3
-2
-1
0
1
2
3
y and x
4
-2
-2.5
-2
-1.5
Mz y and Mz x
27
-1
-0.5
0
0.5
1
1.5
2
2.5
What Mz y?
Mz = I
z (z 0 z)
1
z0
Mz y = y
z (z 0 z)
1
z0y
or
What is this? The residue y after controlling out z information.
Mz x => the residue x after controlling out z information.
Then Mz y and Mz x should not have any relationship.
8.3
Case Study 2: Birth order and SAT scores
Zajonc and Bargh (1980, American Psychologist)
Figure 1 at page 667, July 1980
– They ran
SATi =
+ BOi + ui ;
where BOi stands for birth order.
– Result: Birth order explains SAT score well.
Rodgers, Cleveland, Oord and Rowe (2000, American Psychologist)
28
– They ran
SATi =
+
1 BOi
+
2 M IQi
+ ei ;
where MIQi stands for mother’s IQ.
– Low IQ mothers have bigger family size.
– Result: BO is not related with SAT score.
Conclusion
1. Do not believe any two variable regression result.
2. Check whether or not the regression of interest includes (reasonable) control variables.
3. Even when the control variables are reasonable, if the regressor of interest is time
variant, then read the next chapter.
29
9
Measurement Error
We will study the di¤erence between the true and pseudo cross sectional regressions.
Consider the following example.
SATi =
+ P Ii + u i ;
where SATi is the ith person’s SAT score and PIi is parents’income at a particular year.
P Ii can be changing over time. One parents have two kids. The age di¤erence is 10 years.
Assume the parents income increases over time. When their …rst kid took SAT, their income
was 50K. 10 years later, their income increases to 150K. If
> 0; then does it mean the
second kid’s SAT score is higher than the …rst kid?
1800
Fitted SAT Scores
1700
1600
1500
1400
1300
1200
$0
$50,000
$100,000 $150,000 $200,000 $250,000 $300,000 $350,000
Parents Income
Figure 9-1: Fitted SAT Scores and Parents Income, 2013.
Rewrite
yi =
+ xit + ui ;
where xit is chosen at a particular t:
xit =
i
|{z}
(1) target variable
+
i t
|{z}
(2) ind. in‡uence x common factor
Three components: (1) Time invariant component
and (3) pure idiosyncratic term,
xoit
Example:
30
+
xoit
|{z}
(3) pure individual speci…c term
i;
(2) common components,
i t;
–
i
=> overall your wealth.
–
t
=> economic wide condition.
–
i
=> how the economic condition in‡uences on you.
– xoit => individual speci…c income variation. transitory income.
You need to use
i:
True Regression you need to run:
yi =
+
+ ei ;
(8)
yi =
+ x i + ui ;
(9)
i
The actual regression you ran was
We will rewrite (??) as
yi =
+
i
+ ei
=
+ (
=
+ x i + ui
i
+
) + ei
where
xi =
i
+
; ui = ei
And
=
i t
+ xoit :
Then it is easy to show that
E (xi ui ) 6= 0:
We call
“measurement error”.
31
:
Conclusion
1. If the regressor is time varying, then the regressor must have a measurement error.
2. The regression result is not robust. Will change over time.’
3. In the true cross sectional regression, all variables are not (potentially) time varying
(including control variables).
32
10
Problem Set 3
1. Learn how to access to World Development Indicator.
2. Choose any two varaibles. Run cross sectional regression for each year. Report the
result. Does the result change?
3. Choose additional variable. Run a regression for a particular year. Report the result
only when the original regressor becomes insigni…cant.
33
Part II
Time Series Data
The …rst thing to do is plotting time series graphs.
We will study
Autoregressive process
Vector Autoregressive process
Unitroot Test
Cointegration Test
Error Correction Model
34
11
Autoregressive (AR) Model
11.1
AR1 Model
xt = a + xt
1
+ ut ;
where ut is i.i.d. We call this model AR1.
Features of AR1 Model
E(x2t ) =
2
u = (1
2
2
x;
E(xt xt 1 ) =
E(xt ) = a= (1
2
x
)=
E(xt xt 2 ) =
) 6= a if
2 2
x;
k 2
x:
E(xt xt k ) =
6= 0:
Estimation of Mean Assume that
xt =
+ et ; et = et
1
+ ut
Then we have
xt
1
=
+ et
1
so that
xt =
(1
) + xt
^=
1 XT
xt :
t=1
T
We want to estimate ^ : Then
The variance of ^ is given by
E (^
but since Ext xt
Note that
1
1 XT
) '
xt
t=1
T
1
+ ut :
1 XT
xt
t=1
T
2
2
for a large T;
= Ex2t ; the above term becomes
1 XT
xt
t=1
T
1 XT
xt
t=1
T
2
1 XT
6= 2
xt
t=1
T
(A + B)2 = A2 + B 2 + 2AB:
35
1 XT
xt
t=1
T
2
:
Hence we have
1 XT
xt
t=1
T
1 XT
=
xt
t=1
T2
1 XT
=
xt
t=1
T2
1 XT
xt
t=1
T
1 XT
xt
t=1
T
1 XT
xt
t=1
T
2
1 XT
1 XT XT
x
xt
t
s6=t
t=1
t=1
T2
T
2
1 XT XT
1 XT
+2 2
xt
xt
t=1
s=2
t=1
T
T
2
+
This implies that the t-statistic given by
p
T^
6= r
t^ = p
PT
V (^ )
1
t=1
T
p
xt
Hence we need to use the ‘correct’variance.
T^
1
T
PT
t=1
2
xs
xs
1 XT
xt
t=1
T
1 XT
xt :
t=1
T
:
xt
HAC (heteroskedasticity autocorrelation consistent) Estimator There are various
HAC estimators available. Among them, Newey & West’s estimator is the most popular.
HAC estimator is a long run variance if there is no heteroskedasticity. (di¤erent variance
each time) NW’s HAC estimator is given by
V (^ )NW =
2
1 XT
1 XT
xt
xt
t=1
t=1
T
T
X
X
T
M
m
2
1
+
t=1
m=1
T
M +1
xt
1 XT
xt
t=1
T
xt
m
1 XT
xt ;
t=1
T
where M =int T 1=3 usually. Note that M is called ‘NW’s lag’.
What to do with this? We will need the concept later for general regressions. So wait..
Estimation of
The OLS estimator is given by
P
P
P
(xt T 1 xt ) (xt 1 T 1 xt 1 )
^ =
P
P
(xt T 1 xt )2
P
P
P
(xt 1 T 1 xt 1 ) (ut T 1 ut )
=
+
:
P
P
(xt T 1 xt )2
36
Properties of ^
1+3
: It’s biased. But ^ !
T
E(^) =
as T ! 1:
The distribution of ^ is asymetric if T is not large enough.
11.2
AR2
xt = a +
1 xt 1
+
2 xt 2
+ ut
Transformation: Observe this.
xt =
+ et ; et =
1 et 1
+
2 et 2
1 xt 1
=
1
+
1 et 1 ;
2 xt 2
=
2
+
2 et 2 ;
+ ut
so that
xt =
Next, let
=
1
+
2:
[1
(
We call
1
+
2 )]
+
1 xt 1
+
2 xt 2
+ ut :
‘dominant root’. Then we have
xt = a + (
1
+
2 ) xt 1
= a + xt
1
2
= a + xt
1
2
2 xt 1
(xt
1
2 xt 2
+ ut
x t 2 ) + ut
1
xt
+
+ ut
One more:
xt
xt
1
= a+(
1) xt
xt = a + xt
1
+
1
1
xt
2
xt
1
+ ut
1
+ ut
In general, AR(p) model can be written as
(
P
a + pj=1 j xt j + ut
xt =
;
Pp
a + xt 1
j=1 j xt j + ut
or
xt = a + x t
1
Xp
j=1
37
j
xt
j
+ ut :
Lag Selection There are several lag selection methods. AIC, BIC, PIC, etc. Among
them, AIC overestimates the lag length. However the under-estimation probability with
AIC is almost zero. BIC estimates the lag length consistently, but the under-estimation
probability with BIC is not always zero.
So what? Suppose that xt follows AR2. But you estimate AR1. Then what happens?
xt = a + xt
1
xt = a + xt
1
xt
2
1
+ ut : True
+ et : estimated reg.
So that
et =
xt
2
1
+ ut :
Then ^ becomes inconsistent. In other words, even when T ! 1; ^ 9 : So using AIC is
rather promising in the …nite sample.
11.3
Moving Average (MA) Process
MA(1) process is given by
xt = ut
1
+ ut ;
+ ut ) ( ut
2
+ ut 1 ) =
so that
Ext xt
1
= E ( ut
1
2
Eut
1
=
2 2
u;
but
Ext xt
2
= E ( ut
1
+ ut ) ( ut
+ ut 2 ) = 0:
3
It is useful to explain the time series variable with limited autocorrelations.
Also note that AR(1) process can be convertable to MA(1) process.
xt =
xt
1
=
( xt
=
2
=
+ ut
( xt
2
2
+ ut 1 ) + u t =
2
+ ut
+ ut =
1
X
1
ut 1 + ::: + ut 1 + ut =
3
+ ut 2 ) + ut
xt
1
j=0
38
3
j
xt
1
+ ut
3
ut j :
+
2
ut
2
+ ::: + ut
11.4
Forecasting
AR(p) process is useful to forecast the future value.
x^T +1 = a
^ + ^ xT ;
x^T +2 = a
^ + ^x^T +1 ; :::
We call this method ‘iterative forecast’.
Alternatively, you can forecast xT +2 value by running the following regression.
xt = a + xt
x t = a2 +
1
+ ut : to forecast xT +1 ;
2 xt 2
+ vt : to forecast xT +2
x^T +2 = a
^ 2 + ^ 2 xT :
This method is called ‘direct forecast’.
If the model is well speci…ed, the iterative forecasting method is equivalent to the direct
forecasting method.
39
12
12.1
Unitroot
What is Unitroot?
Random walk model:
yt = yt
1
+ ut : random walk without a drift
yt = a + yt
1
+ ut : random walk with a drift
Consider a simple AR(1) model given by
yt =
so if
(1
) + yt
1
+ ut ;
= 1; then
yt = yt
1
+ ut :
Note that
yt = yt 1 + ut = (yt
t
X
=
us :
2
+ ut 1 ) + ut = u1 + ::: + ut
s=1
Hence yt includes all past and current shocks. Also the past shock never go away.
Random walk model with a drift: It has a linear trending behavior. Observe this.
yt = b + yt
1
+ ut = b + (b + yt
= 2b + yt 2 + ut + ut
t
X
= tb +
us :
1
Features of Unitroot
1. Mean ( ) is not identi…ed. No mean reversion.
)2 =
40
+ ut 1 ) + ut
= 2b + yt
s=1
2. Variance increasing over time. E(yt
2
2
u t:
2
+ ut + u t
1
12.2
Unitroot Test
If yt does not have a trend, then use
yt = a + y t
1+
p
X
j
yt
j
+ ut :
j=2
Test whether or not
= 0: If
= 0; yt follows R.W without a drift. If
< 0; yt is stationary.
Note that under the null of unitroot, the t-statistic has a D.F. distribution (not a standard
normal). The critical value of DF distribution is larger than a standard normal distribution
(in absolute value)
t^ !d D:F:
If yt has a trend, then use
yt = a + bt + yt
1
+
p
X
yt
j
j
+ ut :
j=2
The critical value of t^ is di¤erent. You have to use a smaller critical value. (See Eviews)
13
Vector Autoregressive Model (VAR)
Mutivariate time series analysis. Consider yt and xt together. VAR(1) can be written as
"
# "
# "
#"
# "
#
yt
a1
y
e
t
1
1t
11
12
=
+
+
:
xt
a2
xt 1
e2t
21
22
The …rst equation becomes
y t = a1 +
11 yt 1
+
x + e1t :
}
| 12 t {z1
=ut
In other words, ut is decomposed into xt
VAR(2) is given by
"
# "
# "
yt
a1
=
+
xt
a2
11
21
12
22
#"
1
and new error e1t :
yt
1
xt
1
#
+
"
11
21
12
22
#"
yt
2
xt
2
#
+
"
e1t
e2t
and so on. This VAR model has used popularly in various areas of Economics.
41
#
;
Economic Meaning: Granger Causality Consider a VAR(1) model again. Granger
(1980) proposed a new causality de…nition based on VAR system. If
“xt granger causes yt ”. If
If
21
21
6= 0; similarly yt granger causes xt : If
12
12
6= 0; then we say,
= 0; yt is exogeneous.
= 0; xt becomes exogeneous. Past information is given. So we may treat them as
predetermined values.
When
12
6= 0 : The past information of xt
1
in‡uences on the current behavior of yt :
6= 0 : The past information of yt
1
in‡uences on the current behavior of x:
Hence yt is endogeneous.
When
21
Hence xt is endogeneous.
13.1
Granger Causality Test
1. Use AIC or BIC to choose the lag length, p.
2. Run VAR(p) and test whether or not the lagged variables are signi…cant.
42
1
2
3
4
5
6
7
8
9
10
11
12
13
14
14
Alam, Ali Imran
Allen, Thomas A
Anderson, Mitch
Baird, Katherine
Burgoyne, Jason
Clounch, Brandon
Durden, Abigail
Goudeau, Nicholas
Khan, Humza
Liu, Hsien-Hui
Roberds, Kyle
Shukla, Sameer
Taylor, Evan
Woods, Britt
items
1 to 6
7 to 12
13 to 18
19 to 24
25 to 30
31 to 36
37 to 42
43 to 48
49 to 54
55 to 60
61 to 66
67 to 72
73 to 78
79 to 84
Exercise 4 (Due date: 10/09/15, 5 pm CST)
Q1: Determine the lag length of your data set by using AIC or BIC.
Q2: Run AR regression, and test whether or not ^ = 0. (dominant root)
Q3: Forcast the future in‡ation rates (up to 4 horizons) by using iterative and direct
methods.
Q4: Test unitroot for each variable.
Q5: Test Granger Causality among 6 variables.
43