Download Statistical analysis of Quantitative Data

Document related concepts

Choice modelling wikipedia , lookup

Forecasting wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Linear regression wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Time series wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Statistical analysis
of Quantitative Data
Arkadiusz M. Kowalski
Tomasz M. Napiórkowski
The textbook is co-financed by the European Union from the European Social Fund.
Statistical analysis
of Quantitative Data
Arkadiusz M. Kowalski
Tomasz M. Napiórkowski
Statistical analysis
of Quantitative Data
Warsaw 2014
This textbook was prepared for the purposes of International Doctoral
Programme in Management and Economics organized within the Collegium
of World Economy at Warsaw School of Economics.
The textbook is co-financed by the European Union from the European Social Fund.
This textbook is distributed free of charge.
Table of Contents
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1. BASELINE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1. Equations Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2. Parameter Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3. Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4. Using Statistical Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5. Econometrics Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2. DATA TYPES AND STRUCTURAL EQUATIONS DESCRIPTION . . . . . . . . . . .
2.1. Cross-section, Time-series and Panel data defined . . . . . . . . . . . . . .
2.2. Structural Equation Description . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1. Cross-Section structural equation. . . . . . . . . . . . . . . . . . . . . .
2.2.2. Time-Series structural equation . . . . . . . . . . . . . . . . . . . . . . .
2.2.3. Panel structural equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
13
14
14
16
16
3. VARIABLES AND DATASET WORK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1. Naming Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2. Examining Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3. Stationarity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4. Correlation Matrix Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5. Descriptive Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6. Hypotheses Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7. Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7.1. Dummy Variables: Example . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7.2. Dummy Variables: Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8. Data Cleaning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.9. Data Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
19
20
21
24
25
26
27
27
29
30
32
4. MODEL DETERMINATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1. Model Estimation with a Forward Stepwise Method . . . . . . . . . . . . 34
5
Table of Contents
5. MODEL TESTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1. Multicolinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2. Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3. Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
39
40
44
6. MODEL’S RESULTS INTERPRETATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1. Interpreting and Testing Variables and Coefficients . . . . . . . . . . . . . 48
6.2. Interpreting Model’s Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7. FORECASTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1. Forecasting as a Model Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . .
7.2. Forecasting with ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3. Forecast Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
57
59
60
8. CONCLUSIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A. TRANSITION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
EXAMPLE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Hypothesis Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Unit Route Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
FINAL REMARKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
STATISTICAL TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
z-table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
t-table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
F-table at 0.01 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
F-table at 0.025 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
F-table at 0.05 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
F-table at 0.1 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
χ2 distribution table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6
INTRODUCTION
Dr Arkadiusz M. Kowalski
Tomasz M. Napiórkowski
Is this book for you? If you are connected with econometrics in any way,
this book is for you. If you are just starting the subject, this book will provide
you with the basic theory and show you how to use it effectively through the
employment of econometric software (EViews,1 for example). On the other hand,
if you already have some experience, then this book will be a useful bring-it-all
together place that you may want to visit every time you have a question about
statistical tests, degrees of freedom or other questions about econometrics and
its uses.
So, now when we know that this book is for you, welcome to the place
where some of the most common econometric theories and methodologies are
explained using a step-by-step look at the blueprint of econometric research.
Starting with the raw data and ending with obtaining the final ready-to-submit
model, its testing and interpretation. Each chapter uses everyday-language
explanations so there are no worries that you will be overwhelmed by pages of
equations or words that would require you to carry a dictionary with you. Every
theory, every statistical test is clearly defined and supported by a real-world
example with software outputs and their full interpretation. Take note that this
book is example- and hands-on heavy, with only the essential theory explained.
At the end of the book you will find a full-length example of a research that
will tie directly to the book by following its chapters and subsections. The use of
a full, real-life example that takes the reader from the beginning to the end adds
additional strength to the reader’s understanding of methodologies used. Clear
references to specific sections of the book will provide a deep understanding
of the workflow associated with preforming an econometric study for the
conducted research.
1
To find out more about this software, please visit: http://www.eviews.com/home.html.
7
Introduction
Book Sections
The book is designed around eight chapters and the example.
1. Baseline:
• this section explains basic econometrics and associated notation as
well as what data is used in the examples.
2. Dataset Types and Structural Equations Description:
• a comprehensive introduction to three most common dataset types
(cross-section, time-series and panel) and a how-to regarding the
construction of the structural equation.
3. Variables and Dataset Work:
• a look at how to efficiently name, examine, test, adjust and record
variables, what they are, how to create and use dummy variables as
well as how to clean the dataset.
4. Model’s Determination:
• development of the model from the initial stage to its final version by
employing the LM test for additional information in residuals.
5. Model Testing:
• a detailed look at detection, consequences and solutions to problems
of multicollinearity, autocorrelation and heteroscadesticity.
6. Model Results Interpretation:
• interpretation of the estimated and corrected model, its coefficients
(including coefficient testing) and model descriptive statistics like
R-squared.
7. Forecasting:
• a look at forecasting as a model verification tool, ex-ante and ex-post
forecasting using the plug-in method and ARIMA models.
8. Conclusion:
• crafting finishing remarks.
8
CHAPTER ONE
Baseline
In order to make sure that all readers, regardless of the level of advancement
in the subject, can use this book to their full benefit, this section covers basic
topics and terms needed when working with econometrics and using this book
to its full potential.
Econometrics is a science of crafting mathematical models based on economic
sets of data. As one will come to realize, data can be found on almost anything
that is happening in the world; be it human behavior like consumer spending
or decisions of Federal Reserve Banks on interest rates, all those decisions are
recorded and used by others for analysis.
Popular sources of data are federal banks (e.g., for the U.S., the Federal Reserve
Bank of St. Louis Federal Reserve Economic Data, or FRED, is such an example1),
special government agencies like the U.S. Bureau of Labor Statistics2 or world
organizations in the vein of the World Bank,3 the Organization for Economic CoOperation and Development (OECD),4 EuroStat5 and the International Monetary
Fund (IMF).6
1.1. Equations Explained
In this book, all models will be built based on the “skeleton” with n
independent variables as can be seen in Equation 1:
1
For more information see: http://research.stlouisfed.org/fred2/.
For more information see: http://bls.gov/.
3
For more information see: http://data.worldbank.org/.
4
For more information see: http://stats.oecd.org/.
5
For more information see: http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/
search_database.
6
For more information see: http://www.imf.org/external/data.htm.
2
9
Chapter one: Baseline
Equation 1. Basic structural equation, i.e., the skeleton
Y = β0 + β1X1 + β2X2 + ... βnXn + ε
Source: Authors’ own equation
Here, the following symbols are used:
Y – the dependent variable. It is called the dependent variable because its value
β0 –
β1 –
X1 –
ε–
depends on other variables. This is the variable that the model is aiming to
explain.
a parameter called the constant term. Its presence is required in all estimated
models.
a parameter called the coefficient of the first independent (or explanatory)
variable. When we talk about estimating the model, we are referring to
estimating these coefficients as well as the estimation of the constant
term. After the estimation is completed, especially if all of the explanatory
variables have identical units (dollars, for example), it is useful to list them in
the estimated model in the order of magnitude. This benefits the reader of
the report by immediately informing him or her which of the used variables
have the greatest impact on the dependent variable.
the first independent or explanatory variable. This is one of the variables
used in explaining the movement or the value of the dependent variable. It
is also called the independent variable because it comes from the dataset
and, ideally, does not directly depend on any other variables in the model.
the error term. This accounts for any inaccuracies in the model. Since there
is no such thing as a perfect model, the gap between the estimated model
and the “perfect model” that predicts values of the dependent variable
equal to its actual values, is called the residual.
1.2. Parameter Estimation Methods
There are many methods that allow the researcher to obtain a model and its
parameters, each suited for specific situations. The word “method” really refers
to the way the parameters, βn, are estimated. Such approaches include: Ordinary
Least Squares (OLS), Generalized Least Squares (GLS), Weighted Least Squares
(WLS), Two and Three Stage Least Squares (2SLS and 3SLS) and General Method
of Moments (GMM). This book will mostly employ the use of Ordinary Least
Squares, that can be modified by, for example, adding cross-section and time
fixed or random effects, as it is the most common and the easiest one to use,
and adequately serves the purpose of estimating models like the ones talked
about in this book.
10
1.3. Hypothesis Testing
For detailed explanations of OLS and other approaches, as well as detailed
mathematical explanations of concepts covered in this work, I suggest you refer
to books that are theory heavy. One book worth recommending is Econometric
Models and Economic Forecasts by Robert S. Pindyck and Daniel L. Rubinfeld.7
1.3. Hypothesis Testing
Hypothesis Testing is used to statistically test models and their parts as well
as other statistic-based questions, e.g., difference of means of two data sets
collected.
The hardest step in performing a statistical test is to correctly setting up the
two hypotheses. The first one is called the null hypothesis (H0) and the second
one, adequately, is referred to as the alternative hypothesis (HA or H1), which,
as can be expected, states the opposite of the null. When performing a test,
the decision is made whether to reject or fail to reject the null hypothesis. The
decision depends on the decision rule that states that the null hypothesis is to
be rejected if the critical value is less than the observed one and if the p-value is
less that the established level of significance.
P-value represents the probability that, given the random sample, the
difference between sample means is as large, or larger, than the one being
observed. The level of significance is how much error we are allowing to exist
in the model. At 5% a model is much more restrictive than at 10%. Depending
on the area of research, different levels can be implemented.8 For example,
in marketing the levels will be greater than when performing research in the
medical field.
1.4. Using Statistical Tables
When performing a statistical test, the researcher, through the use of
appropriate formulas, arrives at the observed value that he or she then compares
with the critical value, which is obtained from Statistical Tables – tables that list
values for different distributions based on degrees of freedom and the level of
significance (more on this in later parts of the book). Main distributions include:
t-distribution, F-distribution and Chi-square – χ2-distribution. The way of using
these tables is explained throughout the book on the first occasion it is used. All
tables are included at the end of the book.
7
8
Pindyck, Rubinfeld (1998).
Do not worry, all this will be clear when we move to examples.
11
Chapter one: Baseline
1.5. Econometrics Software
There are many different econometrics software packages available, each
with unique strengths and limitations, and designations (e.g., SPSS is used
in social studies). EViews, SAS,9 STATA,10 SPSS,11 just to name a few, are the
most common ones used. In this book, all of the outputs, models graphs and
estimations will be done using the EViews package that can be acquired at
http://www.eviews.com/. One of the main advantages of this software is that it
is very easy to use as well as visual in providing econometrics solutions. Another
benefit is that it allows working with most common types of dataset (each of
which is explained in detail in Data Types and Structural equations Description
chapter). Lastly, it also allows using many statistical methods, some of which
were mentioned in this chapter’s section titled Parameter Estimation Methods.
9
For more information see: http://www.sas.com/.
For more information see: http://www.stata.com/.
11
For more information see: http://www-01.ibm.com/software/analytics/spss/.
10
12
CHAPTER TWO
Data Types and Structural Equations
Description
Depending on the type of research, datasets – the way the data is arranged
– can be divided into three main categories: time-series, cross-section and panel
data. Each econometrics project that aims at estimating the model requires
setting up a structural equation that can be viewed as a skeleton on which the
final model will be constructed. Such equations have their unique specifics that
depend on the data set that is being used.
2.1. Cross-section, Time-series and Panel data defined
Cross-section data looks at many variables at one particular moment in
time, a snapshot of the entire situation. That is why we say that it is a onedimensional dataset. An example of such data would be an attempt to estimate
the price of the house by looking at individual factors of the house sold (number
of rooms, size of the house, and presence of a pool, for example). To perform
such research, a dataset would consist of many observations (houses), each with
a sale price as well as above-suggested data points.
Time-series data is one that overlooks a particular variable (or a few variables)
through the set time period (for example, the U.S. imports from the first quarter
of 1960 to the last quarter of 2010); the letter t is usually used to depict the
period in which the measurement has been taken. Time-series is used in most
macroeconomics models (the U.S. GDP as a function of consumer spending,
trade deficit, government spending and whether the country is in a recession or
not, for example).
Panel data consists of observations of subjects (that are our dependent variables)
over a specified period of time; a combination of cross-section and time-series sets.
13
Chapter two: Data Types and Structural Equations Description
A good example is a set of data looking at profits of the top 10 transportation
companies in the U.S. over 10 years. Of course, each of the firms comes with its own
set of explanatory data points – an example of such a set is included in Table 1.
Table 1. An example of panel data with averages per firm and per year
listed in the last row and the last column, respectively
Year / Firm
Firm 1
Firm 2
Firm 10
Industry’s Average
2000
10
42
---
44
32.00
2001
22
15
21
19.33
2002
53
62
37
50.67
17
20
52
29.67
8.33
--2008
2009
9
11
5
Firm’s Average
22.2
30
31.8
Source: Authors’ own table on original theoretical data.
Some of the advantages of the panel data are: 1) a large number of data
points (allowing for increased accuracy and additional degrees of freedom),
2) a combination of time-series and cross-section approaches that minimize
the probability of omitted-variables problem. In the firm example, the use of
panel data allows the researcher not only to measure the variation in profits of
a single company over time but also to measure the variation in profits between
companies. The advancement of the panel data is also the source of its problems
as it brings together issues from cross-section and time-series sets.
This book focuses on research done using cross-section and time-series sets of
data. For a detailed look at working with panel data, there is no better place than
Econometric Analysis of Cross Section and Panel Data by J.M. Wooldridge.1
2.2. Structural Equation Description
The structural equation is the basic representation of the model to be
estimated. It provides the reader with a quick mathematical view of what it is
that the research is going to do and what it is trying to achieve.
2.2.1. Cross-Section structural equation
For a simple cross-section dataset, a structural equation in linear form that
attempts, for example, to model house sale price (SalePrice) as a linear function
of the area of the house (Area), number of bedrooms (Beds) and the existence
of an in-ground pool (Pool) will look like Equation 2:
1
14
Wooldridge (2010).
2.2. Structural Equation Description
Equation 2. Simple linear form structural equation for working with a crosssection data set, with i representing a specific observation
SalePricei = β0 + β1Areai + β2Areai + β3Pooli + εi
Source: Authors’ own equation.
The interpretation of coefficients obtained with a model in a simple linear
form is very straightforward – a one unit increase in the independent variable X
will impact the dependent variable Y by βX of its units, where βX is the coefficient
of the X independent variable. For example, referring to Equation 2, assume
that the sale price (SalePrice) of a house is reported in the U.S. dollars, the area
of a house (Area) is reported in squared meters and that the value of β1 is 1520.
In this case the interpretation of β1 is as follows: An increase in the area of the
house by one squared meter will increase the price of the house by 1,520 U.S.
dollars.
A semi-log form (Equation 3 and Equation 4) of the same model would have
at least one variable (dependent or independent) in the model logged – common
practice is to either log the entire right- (linear-log form) or left-hand (log-linear
form) side and to use the natural logarithm, ln.
Equation 3. Simple semi-log form structural equation for working with
a cross-section data set, with i representing a specific observation – loglinear form
InSalePricei = β0 + β1Areai + β2Bedsi + β3Pooli + εi
Source: Authors’ own equation.
Equation 4. Simple semi-log form structural equation for working with
a cross-section data set, with i representing a specific observation – linearlog form
SalePricei = β0 + β1lnAreai + β2lnBedsi + β3lnPooli + εi
Source: Authors’ own equation.
In case of semi-log forms, the interpretation of the coefficient is a bit more
complicated.
Starting with the log-linear form (Equation 3), a one unit increase in the
independent variable X will impact the dependent variable Y by 100βX %, where
βX is the coefficient of the X independent variable. Holding all the assumption
from the linear-form example, let us assume now that βX equals 0.20. In this case
the interpretation of β1 is as follows: An increase in the area of the house by one
squared meter will increase the price of the house by 20%.
15
Chapter two: Data Types and Structural Equations Description
Moving to the linear-log form (Equation 4), a one-percent increase in the
independent variable X will impact the dependent variable Y by 0.01βX of its
units, where βX is the coefficient of the X independent variable. Giving the value
of βX to be 3000, its interpretation is as follows: An increase in the area of the
house by 1% will increase the price of the house by 30 U.S. dollars.
A log-log form (also known as a full-log form, Equation 5) has all variables
in logs.
Equation 5. Simple full-log form structural equation for working with
a cross-section data set, with i representing a specific observation
InSalePricei = β0 + β1lnAreai + β2lnBedsi + β3lnPooli + εi
Source: Authors’ own equation.
In the case of a full-log form, the interpretation is simpler, that is: a onepercent increase in the independent variable X will impact the dependent variable
Y by βX %, where βX is the coefficient of the X independent variable. Assigning βX
to be 4, its interpretation in Equation 5 is as follows: An increase in the area of
the house by 1% will increase the price of the house by 4%.
2.2.2. Time-Series structural equation
When presenting the reader with a functional form of a model base on timeseries data (one with a time factor), a subscript to represent the time period
should be added. The equation (Equation 6) that regresses the U.S. Imports (IM)
on the U.S. GDP (GDP), the U.S. Exports (EX) and Change in Inventory (chng inv)
has the following structural form.
Equation 6. Simple linear form structural equation for working with a timeseries data set, with t representing a specific year
IMt = β0 + β1GDPt + β2EXt + β3chg_invt + εt
Source: Authors’ own equation.
This structural equation, just like the model presented in Equation 2, can also
be transformed into its semi-log and full-log forms.
2.2.3. Panel structural equation
As can be expected, the structural equation of the model that is to be
estimated based on panel data, will combine the features in Equation 2 and
Equation 6. For example, when attempting to model inward foreign direct
investment from the U.S. (IFDI) to six countries (i = 1, 2… 6) over ten years (from
the year 2000 to the year 2009, that is, t = 2000, 2001… 2009) as a function
16
2.2. Structural Equation Description
of hosts’ gross domestic products (GDP), their exports (X) and costs of labor
(LCOST), Equation 7 can be used.
Equation 7. Simple linear form structural equation for working with a panel
data set, with i representing cross-section elements, i.e., host countries,
and t representing time-series elements, i.e., a specific year
IFDIit = β0 + β1GDPit + β2Xit + β3LCOSTit + εit
Source: Authors’ own equation.
This structural equation, just like the model presented in Equation 2, can also
be transformed into its semi-log and full-log forms.
Notice that in all of the above-shown cases, the constant, β0, does not have
a cross-section or a time subscript, unlike the error term, εit.
17
CHAPTER THREE
Variables and Dataset Work
Proper treatment of variables is probably one of the most crucial steps to
setting up a successful project. Indistinctive names, errors and missing values
and other mistakes are bound to occur, and therefore invalidate the entire
research. This section will aim to show how to avoid such pitfalls. Additionally,
it is very important and useful for reference and further research that as work is
conducted all steps and changes are documented.
3.1. Naming Variables
After the literature review and establishing which variables are going to play
a role in obtaining the model, the next step is to properly name them all.
There are two rules to properly doing so:
1) keep it short – it is very likely that they will have to be entered multiple
times while using the software package,
2) be sure that you can recognize the name.
For example, if one of your variables is Disposable Income, naming the
variable disposable_income is not efficient (rule 1 violation) and naming it Yd,
if you are not familiar with using the letter Y to represent income, infringes on
the second rule.
Again, writing everything down is crucial. Table 2 represents an example of
a good way of keeping track of your variables.
19
Chapter three: Variables and Dataset Work
Table 2. Variables Info Table
Name
Symbol in
the model
Unit
Gross Domestic
Product
GDP
Constant 2000 United
States Dollars
Source
of data
World
Bank
Transformations
NA
Source: Authors’ own table.
3.2. Examining Variables
When dealing with time-series data, it is a good practice to examine how the
variable moves over time.
For example, the gross domestic product (GDP) data for the United States in
its graphical representation is shown in Graph 1.
Graph 1. The U.S. gross domestic product (left-hand axis in billion, USD)
Source: Authors’ own graph of data from International Monetary Fund.
A simple analysis of the variable presented in Graph 1 should follow for
each of the variables. An example of such an analysis is: As expected, as time
progresses, GDP increases; therefore, it has an upward trend and it appears to
be a non-stationary variable.
When looking at time as the only component, it is useful to add a trendline
(this can be done in Microsoft Excel, for example1).2
1
This can be done by right-clicking the line on the graph and choosing “Add Trendline…” Here,
a regression can be fitted in various forms (exponential, linear, logarithmic, power, polynomial and
moving average). It is also useful to check the “Display Equation on chart” box and the “Display
R-squared value on chart” box – more on these topics later in the book.
2
Be very careful when analyzing and falling back on these results. As much as these tools are
20
3.3. Stationarity Test
Graph 2. The U.S. gross domestic product (left-hand axis in billion, USD)
with a linear trendline
Source: Authors’ own graph of data from International Monetary Fund.
A visual analysis can also give an indication into whether we should use
a linear, square (a parabolic shape – think of the capital letter U upright or
upside down), cube (wave-like) or a log form (a half-parabola on its side with
the tip being the intercept term) of the variable in the model.
3.3. Stationarity Test
In case of cross-section data, since there is no time factor, this analysis can
be skipped.
When dealing with time-series data, a variable needs to be tested and
corrected for nonstationarity. By definition, a stationary variable will have its
mean, variance and autocorrelation constant over time.
There are three general tests to see if the variable is stationary:
1) visual test (also known as the ocular test, which is the easiest),
2) correlogram,
3) Augmented Dickey-Fuller test.
The ocular test can be done by plotting the data in levels – without any
adjustments – as presented in Graph 1. If an average line drawn (in this case the
linear trend line, Graph 2) is not close to a horizontal line, the data is considered
to be not stationary.
helpful in providing some insight, these insights are very limited as the presented model uses the
horizontal axis’ variable, in this case time, only to model the vertical axis’ variable.
21
Chapter three: Variables and Dataset Work
Table 3. An example of a correlogram of data with a unit root present
Autocorrelation
Partial Correlation
.|*******
.|*******
1
.|******
.|.
|
2
.|*****
.|.
|
3
.|****
.|.
|
4
.|***
.|.
|
5
.|**
.|.
|
6
.|*
.|.
|
7
Source: Authors’ own graph based on results obtained with EViews software.
The correlogram (again, in levels), here presented in Table 3, in a case of
nonstationary data, will have Autocorrelation bars slightly decreasing and Partial
Correlation will have one bar that represents a unit root. More on autocorrelation
in Chapter 6: Model Testing. In this type of an output, the extent of the bar is
represented by the amount of stars; the longer the bar the more stars are used
to represent it.
Table 4. Output of the Augmented Dickey-Fuller test
t-Statistic
Prob.*
Augmented Dickey-Fuller test statistic
0.885655
0.9952
Test critical values:
1% level
-3.464280
5% level
-2.876356
10% level
-2.574746
Source: Authors’ own table based on results obtained with EViews software.
The hypotheses setup for the Augmented Dickey-Fuller test used to detect
the presence of a unit root (data being nonstationary) is:
H0: the variable is nonstationary
H1: the variable is stationary
The analysis of the Augmented Dickey-Fuller output (presented in Table 4)
looks first at the test t-statistic (0.885655) and compares it with the Test critical
value (negative 2.876356), at a chosen level of significance, i.e. 5%. Also, Prob.
(the p-value associated with the test) equals 0.9952, which is greater than the
one associated with a 5% level of significance; p-value = 0.05. Based on the test’s
results, we fail to reject the null hypothesis and therefore conclude that the variable
in question is not stationary. This conclusion is a result of the test t-statistic being
greater than the test’s critical value and Prob. being more than the 0.05.
22
3.3. Stationarity Test
To solve the problem of nonstationarity, differencing is applied; Yt – Y(t-1).
Differencing takes the observation of the past period’s value (Y(t-1)) and subtracts
it from the observation from the current period (Yt).
Graph 3. A graphical representation of the U.S. GDP after it has been
transformed into a stationary variable via first-order differencing; D (GDP)
Source: Authors’ own graph based on results obtained with EViews software.
Table 5. A correlogram of the U.S. GDP after it has been transformed into
a stationary variable
Autocorrelation
Partial Correlation
.|**
|
.|**
|
1
.|**
|
.|*
|
2
.|*
|
.|.
|
3
.|*
|
.|.
|
4
.|.
|
.|.
|
5
Source: Authors’ own table based on results obtained with EViews software.
Table 6. The Augmented Dickey-Fuller test output testing the 1st difference
of the U.S. GDP for stationarity (only relevant information included)
t-Statistic
Prob.*
Augmented Dickey-Fuller test statistic
-5.477582
0.0000
Test critical values:
-2.876356
5% level
Source: Authors’ own table based on results obtained with EViews software.
23
Chapter three: Variables and Dataset Work
The stationary data will have a following graph with the overall linear trend
being nearly horizontal (Graph 3), correlogram (Table 5) and the Augmented
Dickey-Fuller output (Table 6) with the t-static from the test (-5.477582) being
less (greater in the absolute value) than the one at the desired confidence level
(-2.876356) with Prob. = 0.00.
Sometimes taking the first difference is not enough. If that is the case,
differentiation should be repeated until the data is proved to be stationary. This
should be done within the realm of reason. The second degree is usually the
highest degree of differencing.
3.4. Correlation Matrix Analysis
Following should be the analysis of the Correlation Matrix (a table of
correlation coefficients between variables). This has the following goals:
1) to see if there is a linear relationship between the dependent variable and
chosen independent variables,
2) to see the relative strength of the relationship,
3) to see the sign of the relationship,
4) to assess the possibility of multicollinearity.
Table 7. A correlation matrix for the number of the U.S. FDI firms and the
GDP in two regions in Poland
DOLNOŚLĄSKIE
KUJAWSKO-POMORSKIE
Pearson Correlation
-0.246
Sig. (2-tailed)
0.639
Pearson Correlation
0.909
Sig. (2-tailed)
0.012
Source: Authors’ own table based on results obtained with SPSS software.
Let us go over points 1 through 4 by looking at the example data shown in
Table 7.
The null hypothesis states that the coefficient of correlation is equal to zero;
therefore, stating that there is no linear correlation between the two tested
variables. In the example, the p-value for the correlation coefficient -0.246
between the number of the U.S. FDI firms in the Dolnośląskie region and that
region’s GDP is equal to 0.639. Since this value is significantly above any logical
and practical level of significance, the conclusion is that there is no evidence to
state that there is a linear relationship between the two tested variables. When
looking at the Kujawsko-Pomorskie region, since the p-value is equal to 0.012,
i.e., less than the one set at a 5% level of significance (0.05), a statement can be
made that there is a high, positive and statistically significant linear correlation
between the two tested variables for this region. When describing correlation
24
3.5. Descriptive Statistics
between two variables, it is important to make a note of three facts: one, the
strength of the correlation; two, the direction (is the correlation coefficient
positive/negative, suggesting that as one variable increases the other increases/
decreases); and three, the statistical significance.
The correlation matrix should also be used to look at correlation coefficients
between independent variables in order to detect multicollinearity, which occurs
when one explanatory variable is highly correlated with another. For example,
if a model would use household’s income and household’s taxes where the
latter is a derivative of the former, there would be a strong suspicion of
multicollinearity.
The rule-of-thumb is that if the correlation coefficient (which suggests the
strength of a linear relationship between two variables) is greater than 0.8, then
we can expect multicollinearity (which is also suspected when the model has
a significantly high R-squared and very small, in absolute value, t-statistics). More
on this problem, its consequences and its solutions in the Model Testing chapter.
It is important to note that just because two or more variables are highly
correlated with each other it does not mean that one causes another. For
example, the U.S. imports and the U.S. GDP are highly correlated but it does
not automatically mean that one causes another. Here is why. First, the same
example, the question can be asked: Does a high correlation coefficient between
the U.S. imports and the U.S. GDP signifies that changes in the U.S. imports
cause changes in the U.S. GDP, or do changes in the U.S. GDP cause changes in
the U.S. imports? This question can be answered by falling back on the theory,
but on the basis of the results of the correlation coefficient such a question is
impossible to answer.3
3.5. Descriptive Statistics
The next-to-last step in the analysis of variables is to look at the statistical
summary (Table 8), usually provided within the econometrics software, of all the
variables.
Mean (the average value), median (the value in the middle of the set), mode
(the most common value), extreme values (the minimum and maximum) and the
number of observations should be examined – it is important that all variables
have the same number of observations (196 in this case) as missing values will
significantly distort the estimated model’s coefficients. When looking at dummy
variables, the mean will represent the percentage of observations that were
coded with 1 (for example, 18.3673% of all observations took place during
a recession).
3
A hint into the cause-and-effect relationship can be also given by the Granger Causality test;
see: Pindyck, Rubinfeld (1998), pp. 242–245.
25
Chapter three: Variables and Dataset Work
To see if the variable has a normal distribution, the researcher can use three
statistics. First, Skewness which shows the distribution of the mass of the
variable to the left, long tail on the right (positive value) or to the right, long tail
on the left (negative value). Kurtosis on the other hand measures how flat or
how tall the distribution is with an ideal value of 3. The lower/higher the value,
the flatter/peaker the distribution is. Third, the Jarque-Bera statistic can be used
to test for normal distribution with the null hypothesis stating that the variable
is normally distributed. In this example ( Table 8), as p-values (Probability) are
less than 0.00000, we reject the null in favor of the alternative. Still, it needs to
be remembered that the assumption of normal distribution is an “ideal” one
and very often does not work in the real world.4
Table 8. Descriptive statistics of the U.S. imports, the U.S. exports and
a dummy variable for recession
IM
EX
RECES
Mean
735.4768
561.9069
0.183673
Median
490.3720
355.4060
0.000000
Maximum
2208.336
1670.431
1.000000
Minimum
108.4540
94.75800
0.000000
Std. Dev.
635.9210
441.4533
0.388209
Skewness
1.049200
0.827283
1.633843
Kurtosis
2.767814
2.426459
3.669444
Jarque-Bera
36.40038
25.04337
90.86179
Probability
0.000000
0.000004
0.000000
Sum
144153.4
110133.7
36.00000
Sum Sq. Dev.
78857124
38001795
29.38776
Observations
196
196
196
Source: Authors’ own table based on results obtained with EViews software.
3.6. Hypotheses Formulation
The next step, taken after preforming all of the analytical steps presented in
the previous section, is to construct Hypotheses Tests for each variable based on
economic theory and literature review.
For example, for GDP in relation to imports, the hypotheses regarding the sigh
of the coefficient of the GDP explanatory variable is as follows: H0: βGDP < 0 and
H1: βGDP > 0 where we want to statistically reject the null hypothesis; therefore,
4
26
For more see: Wolldrigde (2010).
3.7. Dummy Variables
allowing for a statement that GDP has a positive and a statistically significant
impact on the dependent variable, i.e., the U.S. imports.
Summary of the information for all variables can be presented in a form of
a table (e.g., Table 9).
Table 9. A summary of information for the U.S. GDP variable
Variable
U.S. GDP
Name in the model
GDP
Alternative Hypothesis
H1: βGDP > 0
Source: Authors’ own table.
3.7. Dummy Variables
It is very often that some information cannot be directly inputted into the
model. Variables like sex (male, female), race (white, black, for example), location
(Washington, Richmond, for example) and many more need to be transformed
prior to their use. Another important use of dummy variables is to distinguish
between two periods. When looking at any variables around an economic or
a social event, a researcher may want to designate those observations that took
place prior to the event versus those that followed. For example, Poland joined
the European Union in the year 2004; as a result, a dummy variable (coded
EUDV) can be created that takes the value of zero for the years prior to the year
2004 and one for the years 2004 and after (see Table 10).
3.7.1. Dummy Variables: Example
Table 10. Dummy variable creation: European Union membership example
Year
EUDV
2002
0
2003
0
2004
1
2005
1
2006
1
Source: Authors’ own table.
Here is another example. When seeing if the sale price of a specific car, which
is our left-hand-side variable of the original structural equation (Equation 8),
depends on the sex of the buyer, the researcher should decide on the presented
course of action; first, the set up.
27
Chapter three: Variables and Dataset Work
Equation 8. Dummy variable creation: Sale price example, original equation
(no dummy variable)
SalePricei = β0 + β1X1i + . . . βnXni + εi
where:
CarSalesi – the dependent variable; ith sale price of a specific car in
βn – coefficient of the nth independent variable X
εi – error term
Source: Authors’ own equation.
Let us assume that we have the original data set as presented in Table 11. The
first purchase was done by a male, the second by a female and the third by a male;
the data coded this way cannot be effectively used in model determination.
The solution is to simply assign the value of 1 if the buyer was a female, and
the value of 0 if the buyer was a male.5 In this case, the original data set will be
transformed to look like the one presented in Table 12.
Table 11. Dummy variable creation: Sale price example, original data set
Sale Price
Sex
$120,000
M
$67,450
F
$87,090
M
Source: Authors’ own table on original data.
Table 12. Dummy variable creation: Sale price example, transformed data
set
Sale Price
SexDV
$120,000
0
$67,450
1
$87,090
0
Notice that when a variable is a dummy variable, it is very useful to mark that fact by, for
example, adding capital DV at the end of its name.
Source: Authors’ own table on original data.
This introduces one dummy variable to the original structural equation
(Equation 8), which results in a new one (Equation 9).
5
It does not matter which sex takes which value as long as you have it clearly noted for interpretation purposes.
28
3.7. Dummy Variables
Equation 9. Dummy variable creation: Sale price example, original equation
(with a dummy variable)
SalePricei = β0 + β1X1i + . . . βnXni + βn+1 SEXDVi + εi
Source: Authors’ own equation.
Given how SEXDV is coded (0 for male and 1 for female), the interpretation
of its coefficient, βn+1 , is as follows:
1) if the coefficient of the dummy variable SEXDV is positive, then a statement
can be made that if the buyer is a female, the dependent variable, i.e., the
price for which the car is sold, will be higher than in the case of a male
buyer,
2) if the coefficient of the dummy variable SEXDV is negative, then a statement
can be made that if the buyer is a female, the dependent variable, i.e., the
price for which the car is sold, will be lower than in the case of a male
buyer.
3.7.2. Dummy Variables: Pitfalls
One may be tempted to solve the example from the previous section by
creating two dummy variables; namely, Male (MDV) and Female (FDV). The
first taking the value of zero if the buyer was a female and one if the buyer
was male; the second, the value of zero for a male buyer and the value of
one for a female buyer. In this case, the transformed data set will look as is
presented in Table 13 and the structural equation will take the form shown in
Equation 10.
Table 13. Dummy variable creation: Sale price example, transformed,
version 2, data set
Sale Price
MDV
FDV
$120,000
1
0
$67,450
0
1
$87,090
1
0
Source: Authors’ own table on original data.
Equation 10. Dummy variable creation: Sale price example, original equation
(with two dummy variables)
SalePricei = β0 + β1X1i + . . . βnXni + βn+1 MDVi + βn+1 FDVi + εi
Source: Authors’ own equation.
29
Chapter three: Variables and Dataset Work
This procedure shows the first most common pitfall when employing
dummy variables; namely, including all categories in the model. This creates
the problem of multicollinearity (more on this later on) as MDV and FDV are
perfectly correlated with each other. That is, as one increases from zero to one,
the other, for the same ith observation, decreases from one to zero as compared
with the previous occurrence. Obviously, one can include just one of the two
new variables; MDV showing if the buyer was male (value of one) or not (value
of zero) or FDV showing if the buyer was a female (value of one) or not (value of
zero). The interpretation will be parallel to the one made in the above-presented
example from the previous section.
The second common pitfall is when the researcher decides to base the
model on too many dummy variables. The rule of thumb is that the model
should not contain more than two, maximum three dummy variables – this
of course being subject to the fact that the model does not suffer from the
problem of underspecification (too few explanatory variables) or the problem or
overspecification (too many explanatory variables).6
3.8. Data Cleaning
One of the main reasons for inspecting the data, in addition to getting a feel
of the links between variables (i.e., correlations) as well as how variables change
over time, is to determine whether there are any inconsistencies. Looking
at extreme values, for instance, allows the researcher to identify miscoded
entries. An example would be a house with 0 square footage, 22 bathrooms
and 2 bedrooms, an average minimum labor cost of 15 dollars an hour with
a maximum value of 115 – all of which are clearly illogical and an error. Another
way of finding such values or finding missing entries is to sort the data according
to each variable to see which cells were left empty in the spreadsheet. What
is important, this should be done one variable at time to avoid distorting the
data.
Identifying the problems is straightforward whereas amending the issue can
be as easy as deleting observations and as complicated as finding alternative
ways of acquiring the missing data.
Prior to deciding on the solution, it is important to note that retrieving the
missing data is the better preferred approach – this way the size of the data
set, and therefore degrees of freedom, is not being decreased. If the researcher
decides to look for an alternative source of data, say to complement the unit
labor cost for Poland for the year 2004 (values for other years are known), it
is crucial to look at the methodology behind the data collection of the first
source and make sure that the data point that is being supplemented comes
6
The number of independent variables depends first and foremost on the number of observations and on the literature review.
30
3.8. Data Cleaning
from a source that employs the same methodology. Due to differences in
methodology, differences can reach 30% – this issue is evident when looking at
data on Foreign Direct Investments, for example.
When dealing with cross-section data, where there is no continuity between
observations, deletion of a single or even of few observations usually does not
cause concern; that is as long as the sample size stays large enough. Deletion,
though tempting, is not a good solution when working with time-series data,
data that has a “flow” to it. Removing an observation from a time-series set (for
example, for the first quarter of 1998, when looking at data from the year 1990
to the year 2010 quarterly) creates a hole. If deletion is the only option while
working with time-series data, it has to be done on the variable bases, that is,
an entire variable for which data is missing is deleted.
As can be expected, when put into a corner, that is when deletion of an
entire variable is not possible, it is possible to employ some algorithms that
will methodologically supplement the missing data, e.g., supplementing the
data according to its simplified, linear, trend. A simpler alternative to it is an
averaging method, see Equation 11.
Equation 11. Simple averaging method
Vt = (Vt+1 + Vt–1) / 2
Source: Authors’ own equation.
Say the situation is as presented in Table 14, where observation of GDP for
the year 2004 (GDP2004) is missing and there is no possibility of obtaining it from
another source. Deletion, as has been explained, is not an option as it distorts
continuity.
Table 14. Supplementing the missing data example, original data set
Year
GDP (in billion)
2002
4
2003
6
2004
???
2005
9
Source: Authors’ own table based on original data.
In this case, the missing value will be 7.5 = (9+6)/2 = (GDP2005 + GDP2003)/2.
This method can be employed under the following conditions:
1) values of the missing variable continue to grow at a more-or-less steady
pace, in other words, the value preceding the missing data point is not,
for example 5 and the subsequent value 134,
31
Chapter three: Variables and Dataset Work
2) the number of supplemented observations is minimal in reference to the
entire number of observations in the series,
3) the researcher employing this, or for the matter of fact any other method
regardless of its mathematical advancement, is aware of its limitations.
3.9. Data Description
The purpose of describing the data is to explain to the reader everything that
he or she needs to know. It will include things like sources of data (International
Monetary Fund, for example), its frequency and the range of the data (for timeseries), the number of observations, any transformations – and the methods used
– done to the data (for example, converting monthly data into quarterly data),
assumptions (if using one data to represent another, a proxy; for example, using
daily market closing numbers to reflect customers’ wealth) and the creation of
dummy variables.
32
CHAPTER FOUR
Model Determination
Now when the variables have been examined and the structural equation is
defined, the next step is the model estimation. This part of the book outlines
step-by-step the procedure of moving from raw data obtained from data sources
to arriving at, correcting and interpreting the final model.
There are many options when it comes to deciding which variables should
be included in the model. Usually, the explanatory variables are decided on
based on the literature review and then simply put into the model. The problem
arises when there is the issue of oversaturation of the literature with possible
determinants.1
In this situation, when there is no empirical research that dictates which
independent variables should be used, the researcher is usually forced to rely
on his subjectivism, which, due to its nature, can be questioned by others and
usually should be avoided in any research. Other solutions include, but are not
limited to, stepwise approaches that add explanatory factors to the initial very
limited model based on some statistical property. This property is usually the
maximization of the F-statistic or R-squared.
Addressing some shortcomings of the stepwise method, three main issues
should be understood. First, this method is not a substitute for a literature
review. What this means is that it picks the variables from a given evoked set,
regardless of their theoretical connection, or its lack of, with the dependent
variable. As a result, the set of possible explanatory factors should include only
those variables that have a strong backing in the theory and in the literature
on the topic being researched. Second, new variables are added based on their
statistical importance, not their theoretical importance. As a result, the order
in which the variables are entered is not necessarily the order of importance
1
To study the great article that shows the extent of this issue when doing research in the field
of foreign direct investment see: B.A. Blonigen, J. Piger (2011), Determinants of Foreign Direct Investment, NBER Working Paper 16704.
33
Chapter four: Model Determination
from the point of view of theory and/or the impact a change in the independent
variable will have on the dependent variable. Third, which variables are added
depends on which variables are already in the model. Therefore, least some
variables should be forced into to create an initial model based on their most
common occurrence in the literature on the subject.2
4.1. Model Estimation with a Forward Stepwise Method
In the forward stepwise method,3 the starting point is the initial model that
consists of a small number of independent variables that were decided on based
on commonalities in the articles read during the literature review. This section
looks at the procedure from the manual point of view, that is, whether all the
steps were carried out by the researcher. Despite the fact that this can be done
automatically in such software packages as SPSS, other econometric programs
(e.g., EViews) do not have the automatic option and require the following
procedure to be conducted “by hand.” All estimations are done with Ordinary
Least Squares method of estimation.
Holding all other variables constant, the initial structural equation is presented
in Equation 12. Notice that in this example, subscripts that would designate either
cross-section (i) or time-series (t) modeling are substituted for simplicity with a.
Equation 12. Model estimation with forward stepwise method example –
initial structural, restricted equation
Ya = β0 + β1X1a + β2X2a + β3X3a + β4X4a + β5X5a + εa
Source: Authors’ own equation.
After the structural equation is estimated with econometric software, it
becomes a model. The structural representation is shown in Equation 13. Since
new explanatory factors are going to be added to this model, it is called the
unrestricted model. Notice that now that we are talking about an estimated
2
For more information on the stepwise approach and its limitations see:
1) B. Thompson (1989), Why Won’t Stepwise Methods Die?, “Measurement and Evaluation
in Counseling and Development,” Vol. 21, pp. 146–149.
2) C.J. Hubert (1989), Problems with Stepwise Methods – Better Alternatives, “Advances in
Social Science Methodology,” Vol. 1, pp. 43–70.
3) J.S. Whitaker (1997), Use of Stepwise Methodology in Discriminant Analysis, paper presented at the annual meeting of the Southwest Educational Research Association, Austin,
Texas, January 23, 1997.
3
The reason why this method is called “forward” is because the researcher starts with a small,
restricted initial model and then adds new variables to it. If the opposite was the case, that is, an
unrestricted model with many explanatory variables is the starting point and the objective is to
statistically drop independent variables, the method would be referred to as a backward stepwise
method.
34
4.1. Model Estimation with a Forward Stepwise Method
model, all the parameters that have been estimated have a hat (^) on top of
them and the error term becomes known as the residuals.
Equation 13. Model estimation with forward stepwise method example –
initial structural, restricted model
Ya = β0 + β1X1a + β2X2a + β3X3a + β4X4a + β5X5a + εa
Source: Authors’ own equation.
For the sale price of the house example that has been mentioned previously,
the initial model would, for example, consist of the area of the house, location
(that is, a city or a state), its age, number of rooms and the number of baths.
In order to determine if the initial model is sufficient or not, a statistical
test, the Lagrange Multiplier (LM) test, should be implemented to check for the
presence of additional information hidden in residuals (estimates of errors). The
reason why residuals are expected to hold additional information is that the
restricted model, or any other model, only extracts the information relating to
the used independent variables. As a result, there is always some information
that is not accounted for.
The mentioned test requires an auxiliary regression. Such an equation has
the residuals ( ^
εa) from the initial model (Equation 13) as the dependent variable
that is being regressed on all explanatory variables collected by the researcher.
In the example, there are overall 20 possible independent variables suggested
by the literature, X1–X20, for which the data has been collected. In Equation 14,
the structural equation has alphas (α) that designates the parameters to be
estimated and γa represents the error of the auxiliary regression.
Equation 14. Model estimation with forward stepwise method example –
auxiliary structural equation
^
ε
a
= α0 + α1X1a + α2X2a + α3X3a + . . . + β19X19a + β20X20a + γa
Source: Authors’ own equation.
The estimated auxiliary regression is shown in Equation 15.
Equation 15. Model estimation with forward stepwise method example –
auxiliary structural model
^
ε
a
=^
α0 + ^
α1X1a + ^
α2X2a + ^
α3X3a + . . . + ^
α19X19a + ^
α20X20a + ^
γa
Source: Authors’ own equation.
35
Chapter four: Model Determination
When looking at the output of the auxiliary regression, it is important to
note that the variables already included in the model (X1 – X5) will have a low (in
absolute value) t-statistics and a high p-values. The null hypothesis states that all
of the coefficients in the auxiliary model are equal to zero, and therefore, there
is no further information to be extracted. The alternative hypothesis states that
at least one of the referred to coefficients are not equal to one.
H0: αk+1 = αk+2 = … = αk+m = 0
•
no more information to be extracted
H1: αk+i ≠ 0; least for some i
•
some information that can be added
The LM formula that is used has a Chi-square distribution and is shown in
Equation 16, where n represents the number of observations and R2aux is the
R-squared statistic from the auxiliary model and is described in detail in the
Model Results Interpretation section of this work.
Equation 16. Lagrange Multiplier formula
LM = nR2aux
Source: Pindyck, Rubinfeld (1998), p. 282.
The degrees of freedom would be the number of all available variables minus
the number of variables used in the model being tested, that is, the initial
(restricted, Equation 13) model (20–5 in this example).
Table 15. A section of the Chi-square table with error levels in the first row
and degrees of freedom in the first column
Right tail areas for the Chi-square Distribution
df\area
0.25
0.1
0.05
1
1.3233
2.70554
3.84146
14
17.11693
21.06414
23.68479
15
18.24509
22.30713
24.99579
16
19.36886
23.54183
26.29623
Source: Authors’ own table.
The first step after calculations of the LM statistic are completed is to find the
critical value. In our example, χ2critical for 15 degrees of freedom (size of the set of
possible explanatory variables net the number of independent variables used) at
5% (0.05) will be 24.99579 and can be read from a Chi-square distribution table
(a part of which is shown in Table 15). This value is compared with χ2observed from
the LM formula.
36
4.1. Model Estimation with a Forward Stepwise Method
If the number of observations (n) is, for example, 900 and the Raux2 is 0.257,
the LM would be (900 • 0.257) 231.3. As a result of the test, χ2critical being less
than χ2observed (24.99579 < 231.3), the null hypothesis is rejected and a statement
can be made that there is still some information to be added to the model.
In order to determine which variables ought to be added to the model, the
examination of the auxiliary regression’s output should follow. The possible
explanatory variables that have the highest (again, in absolute value) t-statistics,
and therefore, the lowest p-values, should be added as they are the most
statistically significant. It is wise to only add no more than two variables at
a time. The safest course of action is to add one new independent variable at
a time.
Let us say that one new variable has been added to the original restricted
model’s right-hand side, X6. After this addition, the new model looks as presented
in Equation 17.
Equation 17. Model estimation with forward stepwise method example –
initial structural unrestricted model
^
^
^
^
^
^
^
Ya = β0 + β1X1a + β2X2a + β3X3a + β4X4a + β5X5a + ^εa
Source: Authors’ own equation.
Notice that the model, after it has been expanded with the addition of a new
explanatory element, is referred to as an unrestricted model.
At this point, the new model should be tested again with the LM test. The
procedure is to be repeated till we fail to reject the null hypothesis (in other
words, till χ2critical > χ2observed). At that point, a statement can be made that the
final model has been achieved and now should be tested for multicollinearity,
autocorrelation and heteroscadesticity as described in the next chapter titled
Model Testing.
37
CHAPTER FIVE
Model Testing
After the model is estimated, it needs to be checked. The three most common
and major problems are: multicollinearity, autocorrelation and heteroscedasticity.
This chapter provides the definition of each of these three issues and the ways of
detecting and remedying them.
5.1. Multicollinearity
Multicollinearity exists when two or more of the explanatory variables (for
example, the U.S. GDP and the U.S. Exports in the U.S. Imports estimation
example) are highly correlated with each other. Another cause of multicollinearity
is overfitting or overspecification, which suggests that the researcher was
adding independent variables simply to maximize R-squared without regard
for their statistical significance. As mentioned earlier, another common cause
of multicollinearity is associated with dummy variables. When using dummy
variables, it is important to always leave one of the categories out. If, for example,
the explained variable is believed to be dependent on the seasons of the year,
four dummy variables would be created to reflect whether the observation
took place in summer, autumn, winter or spring. But, when estimating the
model, only three of the four dummy variables would be included to avoid the
multicollinearity problem.
If significant multicollinearity is present, the computer software will not be
able to estimate the model, as one of the mathematical functions it uses will be
impossible to execute.
There are two main ways of detecting this problem. One, the correlation
matrix (shown in the 3.4. Correlation Matrix Analysis section, in Table 7) and,
two, the examination of the regression output (discussed in detail in the next
39
Chapter five: Model Testing
chapter). If the correlation coefficient is high (0.8 and above – again, a rule of
thumb) between any independent variables, multicollinearity can be a problem.
Also, if the model has a very high R-squared statistic, but the coefficients are
not statistically significant (low t-statistics and high p-values), multicollinearity
is expected.
The most common remedies are to either increase the sample size (get more
observations) or drop variables that are the least significant (highest p-values)
and/or are the main suspects of causing the problem. The latter solution needs
to be performed under a caution as deletion of too many variables can lead to
the problem of underspecification.
5.2. Autocorrelation
Autocorrelation (Serial Correlation) exists when a variable is a time function
of itself (today is affected by yesterday, for example) and is a problem only when
dealing with time-series and panel sets of data. If the problem occurs in a crosssection set, it can be either ignored or, preferably, the order of observations can
be changed to solve the problem.
The presence of autocorrelation causes the estimated coefficients of
independent variables to be inefficient (though still unbiased). In addition,
standard errors are biased and any individual hypothesis testing is invalid.
Autocorrelation of the dependent variable can be detected by the ocular test
of the residuals, the correlogram, the Breusch-Godfrey Serial Correlation LM test
and the examination of the Durbin-Watson statistic.
Graph 4. Graph of residuals of a model with the U.S. imports (IM) as the
dependent variable
Source: Authors’ own graph based on calculations conducted with EViews software.
40
5.2. Autocorrelation
The residuals graph may look like the one in Graph 4, in which a pattern that
suggests the presence of the problem of autocorrelation is visible. Here, one
observation appears to be dictated by the one before it. These quick changes
in the trend create sharp tips of the graph. The main benefit of this approach
is that it is quick as it does not require any calculations. At the same time, its
disadvantage comes in the form of subjectivism used by the researcher. As much
as this method can be a good indicator, conclusions on the presence of the
autocorrelation should not be made solely based on it.
Table 16. An example of a correlogram output for the U.S. imports model
Autocorrelation
Partial Correlation
AC
PAC
Q-Stat
Prob.
.|***** |
.|***** |
1
0.751
0.751
114.36
0.000
.|**** |
.|*
|
2
0.598
0.080
187.40
0.000
.|*** |
**|.
|
3
0.375
-0.227
216.18
0.000
.|**
|
.|.
|
4
0.228
-0.020
226.93
0.000
.|*
|
.|*
|
5
0.169
0.147
232.85
0.000
Source: Authors’ own table based on calculations conducted with EViews software.
Significant bars in the Partial correlation column in the correlogram
(Table 16) suggest that there is a problem of autocorrelation. The placement
of theses bars serves as an indicator to which order of the autocorrelation
is present in the model. In this case, the first and possibly the third level of
autocorrelation can be expected as those orders have the longest bars in the
right column. The Prob. column provides p-vales for each of the autocorrelation
orders. It is important to note that initially a large number of autocorrelation
orders will be found statistically significant, which is a big disadvantage of
this approach. The reason for this is that the third order, for example, may be
caused by the first and/or the second order of autocorrelation. On the plus
side, this method of detecting autocorrelation provides more information than
the ocular examination as it suggests which orders of autocorrelation can be
expected.
The next possibility of detecting autocorrelation involves the use of the
Breusch-Godfrey Serial Correlation Lagrange Multiplier Test, for which the
hypotheses setup for autocorrelation looks as follows:
H0: No Autocorrelation
H1: Autocorrelation exists
41
Chapter five: Model Testing
Table 17. An example of the Breusch-Godfrey Serial Correlation LM test
output for the U.S. imports model
Breusch-Godfrey Serial Correlation LM Test:
F-statistic
425.9017
Prob. F (2,192)
0.0000
Obs*R-squared
163.2115
Prob. Chi-Square (2)
0.0000
Source: Authors’ own table based on calculations conducted with EViews software.
The LM formula (Equation 16), as mentioned earlier, has the Chi-square
distribution. For the U.S. imports example, the degrees of freedom would be 6
(the number of explanatory variables in the model). From the Chi-square table
χ2critical = 12.59 and the χ2observed = 163.2115 (which we can either calculate or read
from the Breusch-Godfrey Serial Correlation LM test output – Table 17); hence,
we reject the null hypothesis and conclude that autocorrelation is present. This is
the preferred way of approaching the issue of testing residuals for the presence
of autocorrelation as, due to its mathematical nature, it removes all subjectivism
and its interpretation is clear.
The last way of determining the presence of the autocorrelation is to examine
the Durbin-Watson statistic (that is explained in more detail in the following
chapter). The ideal value is 2.00. Anything below 2.00 suggests a positive
autocorrelation and when the reading is above 2.00 it indicates the presence of
a negative autocorrelation.
There is a number of ways to correct for autocorrelation (Generalized Least
Squares method or adding more significant variables, to just name two). The
easiest two to implement are an introduction of an autoregressive (AR(p)) term
where the letter p indicates the order of the serial correlation and the introduction
of a lagged dependent variable as one of the explanatory variables.
When using the AR(p) approach (Equation 18), it is important that AR(1)
through p terms are introduced. For example, if there is third order autocorrelation
(as suggested in Table 16) terms AR(1), AR(2) and AR(3) should be added to the
model (Equation 19).
Equation 18. Structural equation with an AR(p) term
Yt = β0 + β1X1t + . . . + βnXnt + δ1AR(p) + εt
Source: Authors’ own equation.
Equation 19. Structural equation with AR(p) terms 1 through 3
Yt = β0 + β1X1t + . . . + βnXnt + δ1AR(1) + δ2AR(2) + δ3AR(3)+ εt
Source: Authors’ own equation.
42
5.2. Autocorrelation
AR terms are subject to the same statistical significance tests as other
coefficients (more on that topic in the next chapter). As much as they are easy
to implement, their biggest drawback is that they are very hard, if possible, to
interpret.
When applying the second solution, after introducing the lagged dependent
variable term (Yt-1) into the equation (as shown in Equation 20), all of the original
coefficients (including the constant term) need to be adjusted to properly reflect
their values that have changed due to the correction.
Equation 20. Structural equation with lagged dependent variable as an
additional explanatory variable
Yt = β0 + β1X1t + . . . + βnXnt + 1Yt–1+ εt
Source: Authors’ own equation.
The adjustment (Equation 21) requires dividing the original coefficient’s
^
estimated value ( βn ) by 1 minus the sum of all coefficients associated with
lagged dependent variables used as explanatory variables.
Equation 21. Adjustment of the nth coefficient with r lagged dependent
variables used as independent factors
^
βn
β’n = –––––––––
r
1 – ( Σ 1 m)
^
where:
^
^
β’n – the adjusted value of the original coefficient, βn
m – number of the coefficient of the lagged dependent variable
r – number of lagged dependent variables used as explanatory variables
Source: Authors’ own equation.
For the Equation 20, the adjustment for the coefficient of the first independent
variable would take the following form presented in Equation 22.
Equation 22. Adjustment of the 1st coefficient with one lagged dependent
variables used as independent factors
^
β1
^
β’1 = –––––––––
1 – ( 1)
Source: Authors’ own equation.
43
Chapter five: Model Testing
Analogously to using AR(p) terms, if higher orders of autocorrelation are
expected (for example, 3rd order), all of the orders should be included in the
model (1 through 3). The advantage of this method is that despite the need
for adjustment, it provides coefficients that are easy to incorporate in the
interpretation, description, of the estimated coefficients assigned to the used
explanatory variables.
5.3. Heteroscedasticity
Heteroscedasticity is the existence of different variances among random
variables. A good example of this problem would be the variance of consumer
spending – lower income earners will have a smaller variance when people in the
upper income bracket will have a higher variance. It causes the same problems
as autocorrelation.
To detect this problem tests like the ocular test of residuals (at one end the
spread of residuals will be small and it will increase as the residuals are plotted,
a megaphone or a cone-shape graph), or any of the White, Goldfield-Quandt
or Breusch-Pagan LM tests can be implemented. Just like with autocorrelation
or stationarity, the ocular examination of the graph should be used only as an
indicator of the presence, or the lack of, the problem. Statistical test, like the
ones mentioned above, are the preferred option.
For the LM White test, for example, the hypotheses will look as follows:
H0: No Heteroscedasticity
H1: Heteroscedasticity exists
Table 18. An example of a heteroscedasticity LM White test for the U.S.
Imports model
Heteroscadesticity Test: White
F-statistic
40.04048
Prob. F (20,179)
0.0000
Obs*R-squared
163.4623
Prob. Chi-Square (20)
0.0000
Scaled explained SS
271.0250
Prob. Chi-Square (20)
0.0000
Source: Authors’ own table based on calculations conducted with EViews software.
An example of a statistical test for heteroscedasticity, the LM White test
(shown in Table 18), suggests that the model tested suffers from the presence of
heteroscedasticity and needs to be corrected for it. We make such a conclusion
as the LM test statistic, also known as χ2observed, 163.4623 (in Table 18 reflected as
Obs*R-squared – number of observations multiplied by R-squared of the auxiliary
regression) is greater than χ2critical at 5% level of significance and 20, for example,
degrees of freedom. The decision to reject the null of no heteroscedasticity is
44
5.3. Heteroscedasticity
supported by the fact that the p-value of Prob. Chi-Square (20), read from the
output, is less than 0.00.
One of the popular remedies is called the Weight Least Squares method
of estimating the parameters of a model – where weights are assigned to
observations to adjust for the difference in variance. The key problem with
Weight Least Squares is assigning proper weights to specific observations in such
a way not to distort the results of the research. An easy way out is provided by
many software packages (EViews, for example) that offer an automatic option
that cures the problem of heteroscedasticity.
45
CHAPTER SIX
Model’s Results Interpretation
Model interpretation consists of analyzing two parts of the output received
after estimating the designed model using econometric software, that is, one,
the output regarding the estimation of the model’s parameters (Table 19) and
statistics describing the model as a whole (Table 22). Each of the mentioned
outputs plays a key role in assessing the estimated model. This process will give
the researcher hints as to whether or not the chosen independent variables and
the model as a whole, statistically, do a good job of representing the data.
For this section, the model that is used as an example is estimated based on
the following linear structural equation, Equation 23.
Equation 23. Linear structural equation of the model used in Model’s
Results Interpretation chapter
IMt = β0 + β1YDt + β2POPt + β3Wt + β4GDPt + β5EXt + εt
Source: Authors’ own equation.
47
Chapter six: Model’s Results Interpretation
6.1. Interpreting and Testing Variables and Coefficients
Table 19. Coefficient estimation output from software after estimating the
U.S. imports
Variable
C
Coefficient
Std. Error
3376.174
278.8045
YD
0.142595
POP
-0.024214
W
GDP
EX
t-Statistic
Prob.
12.10947
0.0000
0.065082
2.191002
0.0296
0.001978
-12.24395
0.0000
0.032785
0.007324
4.476295
0.0000
0.299346
0.067465
4.437042
0.0000
0.21193
0.06036
3.51112
0.0006
Where the dependent variable, the U.S. imports (IM) is being regressed on the constant
term (C), Disposable Income (YD), the U.S. Population (POP), Wealth (W), the U.S. GDP
(GDP) and the U.S. Exports (EX).
Source: Authors’ own table based on calculations conducted with EViews software.
In Table 19, the Variable column lists all the explanatory variables entered
into the model as well the constant term, Coefficient column lists the estimated
values of the coefficients of the independent variables as well as the constant
term, Sdt. Error represent standard errors of the coefficients and the constant
term, t-Statistic is parameter’s value divided by its standard error and the Prob.
column shows the p-value associated with each of the estimated coefficients
and the constant term.
The estimated version of Equation 23, based on results presented in Table 19,
is shown in Equation 24.
Equation 24. Estimated version of the linear structural equation of the
model used in Model’s Results Interpretation chapter
^
IMt = 3,376.174 + 0.143YDt + 0.02POPt + 0.03Wt + 0.299GDPt + 0.212EXt + εt
Source: Authors’ own equation.
When dealing with non-probability models (ones that do not involve
estimating the probability that an occurrence will take place based on given
characteristics of the object making the decision, for example), the coefficients
are easy to interpret – as has been shown in section 2.2. Structural Equation
Description. When interpreting the coefficients, it is very important to realize
that all other coefficients are held constant (ceteris paribus). The reason for
such a statement is necessary. Moving away from economics, let us say that
48
6.1. Interpreting and Testing Variables and Coefficients
an overweight person decides to go on a diet and start an exercise program.
After two months, the weight of this person has decreased by 10 kilograms. The
questions are, was it due to the decrease in calories eaten, the exercise program,
or maybe both, and if that is the case, which, the diet or the exercise program,
had a greater impact on the reached weight loss? A parallel example can be
seen of course in any discipline. To bring the discussion back to economics, let
us look at the unemployment rate, that as we know depends on many economic
conditions. Or the gross domestic product, is it spending, is it investment in
capital, is it government spending or net exports.
If none of the variables in the model are logged, like the example in Table 19,
coefficients represent something called marginals. The marginal is interpreted
as follows: a one unit increase in the disposable income (YD) will increase the
dependent variable, the U.S. imports, by 0.142595 units; this is why it is crucial
for the model interpretation to have the units specified clearly. In the U.S.
imports example, the measurements are done in billions of U.S. 2005 dollars,
unless stated otherwise. Using that information, the above analysis of the
coefficient of the disposable income can be improved upon by stating that: In
case of the disposable income of the U.S. customers increasing by 1 billion U.S.
2005 dollars, the model suggests that the U.S. imports will increase by 0.142595
billion U.S. 2005 dollars, or by 142,595,000 U.S. 2005 dollars. Analogously, if
the population of the U.S. increases by one person (unit of measurement), the
U.S. imports will decrease by 0.024214 billion U.S. 2005 dollars; and so on.
The above-presented interpretation of the estimated parameters of the model
is only valid when and if the estimated coefficients are found to be statistically
significant or not. To do so, again the output shown in Table 19 is examined.
By looking at the assigned value of Prob. (that is, the p-value), a statement
regarding the statistical significance of an individual variable can be made. At
the 5% level of significance the p-value is equal to 0.05, which serves as a cutoff
point. The hypotheses for this test are as presented below:
H0: Xn is not statistically significant
H1: Xn is statistically significant
If the p-value of the estimated coefficient of the nth variable is less than 0.05, we
reject the null hypothesis and state that the nth variable is statistically significant.
Any Prob. reading above the cutoff point fails to reject the null because, as will be
shown in a bit, its coefficient is not significantly different from zero.
In addition to looking at the p-value, each of the coefficients can be tested
for its significance with a t-test. If, for example, it is expected that the coefficient
^
of the U.S. GDP variable (βGDP) will be positive (in other words, it is expected
that as GDP increases, the imports of the U.S. are expected to also increase) it
^
is important to test if the estimated coefficient, βGDP , actually is, as expected,
greater than zero.
49
Chapter six: Model’s Results Interpretation
The test for the significance of the calculated coefficient of the nth variable
should have the following steps:
1) set up the null and the alternative hypotheses statements,
2) select the critical value based on the confidence interval,
3) compute the tobserved and compare against tcritical,
4) make a statement about the success of rejecting or a failure to reject of
the null hypothesis.
This is called the one-tail t-test as there is some expectation on the sign on
the tested coefficient.
Table 20. Summary of the coefficient testing procedure for one-tail tests
If the variable is expected
to have
Hypothesis statement
a negative coefficient
a positive coefficient
H0: βn ≥ 0
H1: βn < 0
H0: βn ≤ 0
H1: βn > 0
Formula
^
(βn – βtest)
tβn = –––––––––
Sβ^n
^
Where Sβn is the standard error of the coefficient of the nth variable and βtest is the value βn
is compared to, in this case βn = 0.
Source: Authors’ own table with formula from Pindyck, Rubinfeld (1998), p. 112.
The degrees of freedom are equal to n – k (number of observations less the
number of explanatory variables in the model). If, for example, the number of
observations is 50 and the number of independent variables in the model is
20, then the degrees of freedom are 30 and the t-statistic at 5% is 1.697. If
tcritical is less than tobserved, we reject the null and confirm that the coefficient of
the nth variable is consistent with the assumptions made based on the economic
theory.
It is possible to see if statistically the coefficient of the nth variable is greater,
less than or equal to a specific value. For the first two tests, simply choose the
appropriate hypothesis statement from Table 20 and substitute the value for βtest
^
that βn is being tested against into the formula.
For example, to test if the coefficient of disposable income (YD in Table 19,
with 200 observations) is statistically greater than 0.01, the test would look as
follows:
Hypothesis setup:
H0: βn ≤ 0.01
H1: βn > 0.01
T-test is shown in Equation 25.
50
6.1. Interpreting and Testing Variables and Coefficients
Equation 25. Example of the t-test
(0.142595 – 0.01)
tβ^n = ––––––––––––––– = 2.0373
0.065082
Source: Authors’ own equation.
Conclusion
Since tobserved (2.0373) is more than tcritical (1.6448), we reject the null
hypothesis in favor of accepting the alternative hypothesis, and therefore state
that statistically, the coefficient of the disposable income variable is greater
than 0.01.
The test to see if the coefficient is statistically different from a given value,
the two-tail t-test is used. This test is also used when there is no inclination,
hints, to whether a positive change in a tested independent variable will have
a positive or a negative impact on the dependent variable. The two-tail t-test
mimics the one-tail t-test, with the following adjustments listed in Table 21.
Table 21. Summary of the coefficient testing procedure for two-tail tests
Hypothesis
statement
Formula
t-statistic
^
(βn – βtest)
tβn = –––––––––
Sβ^n
H0: βn = 0
H1: βn ≠ 0
Example: given d.f. = 30 at 5%
the t-statistic = 2.042
Source: Authors’ own table with formula from Pindyck, Rubinfeld (1998), p. 112.
The decision rule for null’s rejection (tcritical < tobserved) stays the same.
Sometimes, the one- or two-tail t-test, which is used only for a single variable
at a time, will result in that variable being statistically insignificant. Yet, there
are times when the same variable combined with another one (or two, or three)
will be, as a group, found statistically significant, and therefore the model
will be improved with their addition. To test for a combined significance (or
joint significance) of two or more variables at the same time, the F-test with
F distribution is used. In principal, the F-test compares the restricted model (one
to which new variables are to be added, Equation 26) to the unrestricted model
(Equation 27) containing the new independent variables.
Equation 26. Joint significance test – structural model, restricted
^
^
^
^
^
^
Yt = β0 + β1X1t + β2X2t + β3X3i + εt
Source: Authors’ own equation.
51
Chapter six: Model’s Results Interpretation
Equation 27. Joint significance test – structural model, unrestricted
^
^
^
^
^
^
^
Yt = β0 + β1X1t + β2X2t + β3X3i + β4X4i + β5X5i + ^
εt
Source: Authors’ own equation.
The hypothesis statement for the joint test is as follows:
H0: β4 = β5 = 0
H1: β4 ≠ β5 ≠ 0
Similarly to the LM procedure for adding new explanatory variables to
a restricted model, the null hypothesis assumes that there is no difference
between the coefficients of newly inserted independent variables and that they
both equal to zero. The alternative hypothesis states that the coefficients are
not equal and that they are different from zero. The F-test formula is shown in
Equation 28.
Equation 28. F-test formula with Error Sum Squares
(ESSR – ESSUR) / q
Fq,(n–k) = –––––––––––––––
ESSUR / (n – k)
Source: Pindyck, Rubinfeld (1998), p. 129.
The UR and R subscripts designate the use of unrestricted or restricted models
respectively, q is the number of variables tested (in this case 2, Equation 27), n is
the number of observations and k is the number of explanatory variables in the
unrestricted model. The principle of this test is that the Error Sum of Squares
(covered later) is less in the unrestricted than it is in the restricted model if the
added variables, combined, are truly significant explanatory contributors. After
a few transformations (shown in Equation 29 and Equation 30), the formula can
be rewritten using R-squared (R2) as presented in Equation 29.
Equation 29. R2 of the unrestricted model as a function of its Error Sum of
Squares and Total Sum of Squares
ESSUR
R2UR = 1 – ––––––
TSSUR
Source: Pindyck, Rubinfeld (1998), p. 130.
52
6.2. Interpreting Model’s Statistics
Equation 30. R2 of the unrestricted model as a function of its Error Sum of
Squares and Total Sum of Squares
ESSR
R2R = 1 – ––––––––
TSSR
Source: Pindyck, Rubinfeld (1998), p. 130.
Equation 31. F-test formula with R-squared
(R2UR – R2R) / q
Fq, n–k = –––––––––––––––
(1 – R2R ) / (n – k)
Source: Pindyck, Rubinfeld (1998), p. 130.
This formula assumes that, if the alternative hypothesis is correct, the
unrestricted model explains a greater amount of variation in the dependent
variable as compared with the restricted model
The decision rule is similar to other tests: if Fcritical is less than Fobserved, the null
hypothesis is rejected and, in the example above, combined coefficients of X4
and X5 are statically different from each other and zero; therefore, they add
additional information to the model.
6.2. Interpreting Model’s Statistics
After conducting a detailed analysis of estimated coefficients, the model as
a whole has to be taken under revision using its descriptive statistics (Table 22).
Table 22. Model’s statistics output from the software after estimating
the U.S. imports by regressing them on the constant term (C), disposable
income (YD), the U.S. population (POP), wealth (W), the U.S. GDP and the
U.S. exports
R-squared
0.99315
Mean dependent var
757.2164
Adjusted R-squared
0.992974
S.D. dependent var
647.7554
S.E. of regression
54.29724
Akaike info criterion
10.85636
571948.9
Schwarz criterion
10.95531
Hannan-Quinn criter.
10.89641
Durbin-Watson stat
0.185468
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
-1079.636
5625.545
0
Source: Authors’ own table based on calculations conducted with EViews software.
53
Chapter six: Model’s Results Interpretation
R-squared – Equation 32 – represents the percentage of the variation
in the dependent variable explained by the model. For example, in Table 22,
R-squared equals 0.99315; therefore, a statement can be made that 99.31%
of variation in the dependent variable is explained by its regression on the
independent variables. It ranges from 0 to 1. As much as this statistic is very
often the quoted one, it suffers from a serious problem. That is, it will increase
as long as explanatory variables are added to the model, regardless of their true
significance. This issue arises as there is no adjustment for changing degrees of
freedom. For the purpose of solving this issue, the Adjusted R-squared statistic
was developed.
Equation 32. R-squared formula
R2 =
^
_
^
_
Σ (Y + Y) / Σ (Y –Y)
2
i
2
i
Source: Pindyck, Rubinfeld (1998), pp. 112–113.
Adjuster R-squared – Equation 33 – is interpreted similarly to R-squared but
it is the preferred measurement as it is adjusted for the degrees of freedom.
With regular R-squared, the addition of variables, regardless of their statistical
significance to the model, will always increase it. When adding insignificant
variables, the Adjusted R-squared will decrease and it can become negative.
Adjusted R-squared ranges from – 1 to 1. Similar to R-squared, the higher the
value of the Adjusted R-squared the better job the model does in explaining the
variation of the dependent variable.
Equation 33. Adjusted R-squared formula
Adjsuted R = 1 –
2
[Σ
^2
εt
––––––
(n – k)
_
] /[ Σ
(Yi – Y)2
––––––––
(n – 1)
]
Source: Pindyck, Rubinfeld (1998), pp. 112–113.
Sum Squared resid. – Error Sum of Squares (ESS, Equation 35), is a measure
^
of the discrepancy between the original data (Yi ) and the estimated model (Yi);
in other words, variation in the residuals or the unexplained variation. In the
estimated model, ESS is equal to the Total Sum of Squares (TSS) net Regression
Sum of Squares (RSS, Equation 37). TSS is the variation in the dependent variable
(Equation 36) and RSS is the explained variation (Equation 37).
54
6.2. Interpreting Model’s Statistics
Equation 34. Total Sum or Squares
TSS = RSS + ESS
or
Σ
_
(Yi – Yi)2 =
Σ
^
_
(Yi – Yi)2 +
^ 2
Σ (Y – Y )
i
i
Source: Pindyck, Rubinfeld (1998), p. 89.
Equation 35. Error (Residual) sum of squares
ESS =
^ 2
Σ (Y – Y )
i
i
Source: Pindyck, Rubinfeld (1998), p. 89.
Equation 36. Total sum of squares
TSS =
Σ
_
(Yi – Yi)2
Source: Pindyck, Rubinfeld (1998), p. 89.
Equation 37. Regression sum of squares
RSS =
Σ
^
_
(Yi – Yi)2
Source: Pindyck, Rubinfeld (1998), p. 89.
F-statistic – statistic used to measure the overall statistical significance of the
model.
Prob(F-statistic) – probability of the F-statistic. The null hypothesis states that
the model as a whole is statistically insignificant, while the alternative hypothesis
says that the model as a whole is statistically significant. If the probability of the
F-statistic is less than the level of significance the null hypothesis is rejected and
a conclusion can be made that the model as a whole is statistically significant.
Mean Dependent Variable – mean of the dependent variable.
S.D. Dependent Variable – standard deviation of the dependent variable.
Akaike, Schwarz and Hanna-Quinn criterions – information criterions that
measure the goodness of fit of an estimated statistical model. The smaller the
value assigned to those criterions the better the fit.
55
Chapter six: Model’s Results Interpretation
Durbin-Watson statistic – a guide to detecting autocorrelation, with the ideal
value being 2.00 (suggesting no autocorrelation). A reading below suggests
a presence of a positive autocorrelation and a reading above hints of a negative
autocorrelation. As has been described in the section on autocorrelation, this
statistic comes with its drawbacks that need to be made a note of.
In addition to R-squared and other statistics, plotting actual, fitted data and
residuals on one graph (Graph 5) provides a good representation of how the
model fits the original data and how the residuals are behaving. For example,
the presented graph shows that the model does a good job of fitting the data as
the fitted and the actual plots are hard to distinguish between. Residuals, with
the exception of the year 2009, appear to show no signs of heteroscadesticity
and suggest minimal, if any, signs of autocorrelation.1 Residuals graph is also
a good place to look for signs of seasonality.
Graph 5. Graph of actual values (Actual), the fitted model (Fitted); both on
left-hand side axis, and resulting residuals (Residuals); right-hand axis
Source: Authors’ own graph based on calculations conducted with EViews software.
1
Of course, these are just ocular observations and, as has been mentioned when discussing
each of these problems, additional tests should be conducted before making final claims on residuals of the model.
56
CHAPTER SEVEN
Forecasting
The purpose of econometric models can be seen from two perspectives: one,
to look at what took place and two, to look into the future (time-series) or to
predict values (cross-section). In other words, given certain conditions, what
should be the value of the dependent variable? Looking into the past allows the
researcher to see what variables, and in what magnitude, have contributed to
the value of the researched (explained) variable. An example would be a hedonic
housing price model that as a result of provided characteristics, estimates
their direct effect on the price for which the house was sold. This allows, with
a certain degree of error, to estimate what a house with a set of given descriptive
characteristics should sell for. The same can be applied to time-series models
when, given values of specific explanatory variables, the model allows for an
estimation of the parameters of explanatory values, which then can be used to
simulate what the value of the dependent, researched, variable will be given the
values of independent, explanatory variables.
7.1. Forecasting as a Model Testing Tool
There is no testing like testing in the field. In addition to testing the model
by looking at its statistics (R-squared, for example) another form of testing the
model is by using it as a forecasting tool. The problem with testing the model
using conventional forecasting is that it cannot be done immediately; it has to
be done at the future (ex-ante or out-of-sample) time when model forecasts can
be compared with actual numbers. For example, if we were to estimate a model
with Poland’s imports as the dependent variable based on data from 1990 to
2010, and the estimation itself took place in the year 2010, we would have to
wait till another observations of independent variables could be collected (if the
57
Chapter seven: Forecasting
data is annual, then the year 2011), plug them into the model and compare the
value obtained from the model with the actual record of Poland’s imports.
The solution to this problem is ex-post forecasting, or forecasting within the
dataset available. In order to do this, prior to estimating the model, a sample
has to be properly set up.
Figure 1. Division of the original data set into Estimation Period, Ex post
and Ex ante sections
T1
T2
T3
Present time
Estimation period
Ex post
forecast
period
Ex ante forecast period
Source: Authors’ own graphic based on Pindyck, Rubinfeld (1998), p. 203.
Usually, when the model is being estimated, it is done so on all data that is
being available at the moment of the research being conducted (in Figure 1, T1
to T3). There are many good reasons why; two big ones are: one, to have the
biggest data set possible, which in turn increases precision as well as allows for
the use of more explanatory variables due to the increased number of degrees of
freedom; two, it allows to capture the most recent trends (for obvious reasons,
an estimation of a model with data from the years from 1960 to 1970 to reflect
current trends misses the point completely).
Data permitting, some observations, ideally the most recent ones as they
are the closest to what comes next, should be left out of the data used to
estimate the model. That is, the model should be estimated on data from T1 to
T2 (Figure 1), leaving observations from T2 to T3 for model’s testing via ex post
forecasting.
To perform a forecast or test the model ex post, the easiest way is to simply
plug in the values of explanatory variables from T2 to T3 (Figure 1) into a model
with estimated parameters, and then to compare the results with the values of
the dependent variable from that data frame. The plug-in method can also be
used to forecast ex ante. The only difference is that the question is not how close
our estimated values of the independent variable are to their corresponding
actual values, but, given the values of dependent variables, what would the
value of the independent variable be in, for example, T3+1.
58
7.2. Forecasting with ARIMA
7.2. Forecasting with ARIMA
When dealing with time-series data, a very popular way of forecasting
variables is through the use of an ARIMA (p, d, q), an Autoregressive Integrated
Moving Average model. ARIMA models consist of the autoregressive (AR, p)
and moving average (MA, q) terms where p and q are the respective orders of
those processes. I in the model comes from indifferencing the data to make
the variable a stationary one with d being the order of integration. When the
forecasted variable used is stationary to begin with, (d = 0), ARIMA becomes
ARMA (p, q).
ARIMA as a tool has its significant advantages. First, it does not require
any explanatory variables, all one needs is the dependent value itself. Second,
this makes the analysis quick to conduct and, when needed, is also no time
consuming when the procedure needs to be repeated (which comes in handy as
will be proven a bit later). The obvious drawback of ARIMA is that, as it depends
on the order of observations, it can only be used to work with time-series data.
Also, as no independent variables are used, this analysis does not provide any
information on determinants of changes in the variable of interest.
There are four steps that need to be followed in order to achieve an effective
ARIMA model:
1) test and correct the variable for nonstationarity,1
2) identify the AR and MA terms.
Correlogram of the U.S. imports variable (shown in Table 23), is used to
identify p and q, with the Autocorrelation column representing orders of Moving
Averages and Partial-Autocorrelation representing orders of autocorrelation.
Table 23. Correlogram of the U.S. imports variable after it has been
differentiated
Autocorrelation
.|**** |
Partial Correlation
.|**** |
1
.|*** |
.|*
|
2
.|*
|
*|.
|
3
.|*
|
.|.
|
4
Source: Authors’ own table based on calculations conducted with EViews software.
Looking at the above results (which are of I/1/) p = 1 and q = 2.
3) finalize the ARIMA model.
1
Described in detail in section 3.3. Stationarity Test
59
Chapter seven: Forecasting
Based on the correlogram (Table 23), the ARIMA (1, 1, 2) model is estimated
(results of which are posted in Table 24).
Note that since the first difference was needed to be taken in order to achieve
data stationarity, the dependent variable (the U.S. imports in this case) in the
model is in its first difference.
Table 24. ARIMA (1, 1, 2) model output for the U.S. imports variable. Note
that the independent variable is not IM but d(IM) – the 1st-level difference
of the original independent variable
Variable
C
Coefficient
Std. Error
t-Statistic
Prob.
10.95103
2.640449
4.147412
0.0001
AR(1)
0.41178
0.146435
2.812033
0.0055
MA(1)
0.042426
0.143773
0.29509
0.7683
MA(2)
0.308006
0.089616
3.436974
0.0007
Source: Authors’ own table based on calculations conducted with EViews software.
It is hard to interpret the coefficients though their statistical significance can
be tested using p-values. Similarly, model statistics like R-squared can be used to
evaluate the model’s fit to the original data. Since that is not the main purpose
of using ARIMA models, high R-squares are unlikely.
4) forecast and test the model, adjust when needed.
It can be hard to estimate the best ARIMA model on the first attempt, as
reading of the correlogram is subjective to researcher’s interpretation. That is
why using this approach sometimes is referred to as an art or a skill. If the initial
model proves to be unsatisfactory, adjustments to the number of AR and MA
terms can be made. Also, it is always worth checking the neighboring model,
that is, ±1 AR and ±1 MA orders, when looking for the best fit and the best
forecast (the word “best” being used relatively, of course).
Of course, it is useful to first test the estimated model ex post in order of its
evaluation and then ex ante.
7.3. Forecast Evaluation
The researcher has a lot of tools to evaluate the forecast. Two common
ones are descriptive statistics as Proportions and the Root Mean Square Error
(provided by the software and which the process aims to minimize) and the
ocular test by introducing upper and lower limits.
Starting with the latter, as with any ocular test, a lot is left open for the
interpretation of the examiner. As a result, setting the limits is a subjective
60
7.3. Forecast Evaluation
procedure. One common approach is to take the forecasted value and then add
double the standard error of the forecast to create the upper limit and to subtract
it; therefore, creating the lower limit. By plotting the original and the forecasted
values with the addition of limits (example shown in Graph 6) over the ex post
period shows how well the forecast fits the actual occurrences within the set
boundaries. If the forecast is expected to meet more restrictive requirements,
the above-mentioned limits can be created with, for example, just one standard
error – the case is opposite for more liberal requirements. The rule of thumb is
that as long as original values stay within the limits of the forecast, the model
does a good job of forecasting the dependent variable. Same evaluation method
can be applied to the plug-in method.
Graph 6. A plot of the original U.S. imports data (IM) versus the forecast
(IMF) and the upper (UP) and lower (DOWN) limits
Source: Authors’ own graph based on calculations conducted with EViews software.
From the ocular examination of the forecasted values, the used ARIMA (1, 1, 2)
model performs well over the first year; its values are nearly indistinguishable
from the actual ones. But at the end of the year 2008, the forecast loses its validity
as the actual values cross the set lower limit. It is very likely that a better ARIMA
model should be used. Moreover, this shows that the longer the forecasted
period, the greater the allowance for its error.
Moving to some statistics as tools of evaluation, the first one is the Root
Mean Squared Error (in the shown example it is equal to 267.5270) that is
useful when comparing forecasts carried out with different models; the better
the forecast the lower the value of the discussed statistic. The catch is that
this is a comparative statistic, i.e., it is used to compare between the forecasts
performed with different models, not the forecast itself. Other three statistics
61
Chapter seven: Forecasting
that should be examined are the Bias Proportion, the Variance Proportion and the
Covariance Proportion. The first statistic shows the spread between the mean of
the forecast and the mean of the actual data. The second one does the same but
for the variation of the forecast and the actual data. The last one measures what
is left, that is, the unsystematic forecasting error. For a forecast to be considered
a good forecast, the bias and the variance proportions should be as close to zero
as possible, with all the noise being collected in the covariance proportion.2 In
the example the bias proportion is equal to 0.449675, the variance proportion
to 0.237572 and the covariance proportion to 0.312753; again suggesting that
a better ARIMA model should be sought after.
2
62
For more information see: Pindyck, Rubinfeld (1998), pp. 210–214.
CHAPTER EIGHT
Conclusions
After completing the research and describing it in an appropriate length and
detail, a Conclusions paragraph consisting of closing remarks is written. The
conclusion is not the same as the abstract that talks about what took place from
the beginning to the end; the conclusion focuses more on end results and future
actions.
A brief summary of the results and their comparison with conclusions drawn
from the literature review and economic theory are a good starting point.
Another common topic to be included in this segment is the discussion of any
problems incurred with the work, their sources and a list of possible solutions.
One person cannot cover the topic researched in its entirety. Therefore, the
researcher should suggest the areas related to the topic in which further studies
should be conducted or parts of his or her own work that can be improved
upon.
63
A. Transition
At this point you have a good understanding of what it takes to get raw
data and transform it using econometrics software packages into meaningful
information. To further see how this is done, it is a good idea to take a look at
one example that carries you through all the steps.
65
Example
Let us work with data regarding the U.S., more specifically, its macroeconomic
conditions. Prior to starting, it is important to note three things: one, this
example is a full-length one, but it is made as short as possible by omitting some
descriptions; two, this example focuses on the econometric part of a study, as
a result the descriptive parts of the study as well as the literature review have
been omitted; and three, as this is a real-world example, that is, the data is not
staged or edited, some of the results may not look as pretty as they should.
Setup
The aim of this study is to look at what factors should be taken under
consideration when explaining changes in the import of the U.S. Therefore, the
structural equation will take the form shown in Equation 38.
Equation 38. Structural equation for the U.S. imports as the dependent
variable
IMt = β0 + β1X1 + β2X2 + . . . + βnXn + εt
Source: Authors’ own equation.
IMt represents the U.S. imports in year t and it will be explained with the set
of potential independent variables listed in Table 25. These variables were found
through the process of the literature review.
67
Example
Table 25. Potential independent variables
Name
U.S. imports
Real Disposable
Personal Income
Symbol
Unit
Source of data
in the model
Imports of Goods and Services
U.S. Department
Billions of
of Commerce:
IM
Chained U.S.
Bureau of
2005 Dollars
Economic
Analysis
Independent Variables
U.S. Department
Billions of
of Commerce:
YD
Chained U.S.
Bureau of
2005 Dollars
Economic
Analysis
Total Population:
All Ages
including Armed
Forces Overseas
POP
Thousands
U.S. Department
of Commerce:
Census Bureau
Dow Jones Index
DJ
Index
finance.yahoo.
com*
CPI
Index, 1982–
84 = 100
U.S. Department
of Labor:
Bureau of Labor
Statistics
Consumer
Price Index
For All Urban
Consumers: All
Items
Exports of
Goods and
Services
EX
Real Gross
Domestic
Product
GDP
Real Change
in Private
Inventories
CHG.INV
Presence of
NAFTA
NAFTA
U.S. Department
of Commerce:
Billions of U.S.
Bureau of
Dollars**
Economic
Analysis
U.S. Department
Billions of
of Commerce:
Chained U.S.
Bureau of
2005 Dollars
Economic
Analysis
U.S. Department
Billions of
of Commerce:
Chained U.S.
Bureau of
2005 Dollars
Economic
Analysis
Dummy
Variable
(1 – Yes,
0 – No)
68
Note
Seasonally
Adjusted, Annual
Rate
Seasonally
Adjusted, Annual
Rate
Reported
monthly,
transformed
into quarterly to
match other data
Reported
monthly,
transformed
into quarterly to
match other data
Seasonally
Adjusted
Seasonally
Adjusted, Annual
Rate
Seasonally
Adjusted, Annual
Rate
Example
Dummy
Variable
Presence of the
Gold Standard
GOLD
(1 – Yes,
0 – No)
Dummy
Variable
Presence of the
recession
RECES
(1 – Yes,
FRED***
0 – No)
* Source: http://finance.yahoo.com/q/hp?s=^DJI&a=00&b=1&c=1960&d=02&e=2&f
=2010&g=m&z=66&y=594.
** Ideally, all data would be in the same constant units, but such data was not available
for the U.S. exports.
*** http://research.stlouisfed.org/fred2/help-faq/.
Source: Authors’ own table.
Descriptive Statistics
Now that the set of variables to work with has been selected and the data
for them has been collected, it is time to look at descriptive statistics presented
in the end of the text. The most important observation is that there is an equal
number of observations (200) for all variables. As expected, none of the variables
have a normal distribution, but, as it is discussed earlier and in the suggested
reference, this is not an issue.
69
Example
Hypothesis Statements
Hypothesis statements, which are based on the examined literature, are
presented in Table 26.
Table 26. Hypothesis statements for all independent variables
Variable
Name in the
model
Alternative
Hypothesis
Real Disposable Personal Income
YD
H1: βYD > 0
Total Population: All Ages including Armed Forces
Overseas
POP
H1: βPOP > 0
Dow Jones Index
DJ
H1: βDJ > 0
Consumer Price Index For All Urban Consumers:
All Items
CPI
H1: βCPI < 0
Exports of Goods and Services
EX
H1: βEX ≠ 0
GDP
H1: βGDP > 0
Real Gross Domestic Product
Real Change in Private Inventories
CHG.INV
H1: βCHG.INV > 0
Presence of NAFTA
NAFTA
H1: βNAFTA > 0
Presence of the Gold Standard
GOLD
H1: βGOLD ≠ 0
Presence of the recession
RECES
H1: βRECES < 0
Source: Authors’ own table.
70
Example
Correlation matrix
The next step is to look at the correlation matrix (see Table 27) for high
correlations coefficients between the dependent variable and the possible
independent variables as well as for signs of multicollinearity.
RECES
GOLD
NAFTA
CHGINV
GDP
0.97 0.94
0.98
0.94
0.98 0.97 -0.02
0.85 -0.49
0.01
YD
0.97
1.00 0.99
0.94
0.99
0.98 1.00 -0.05
0.86 -0.64
0.02
POP
0.94
0.99 1.00
0.91
0.99
0.97 0.99 -0.04
0.86 -0.68
0.01
DJ
0.98
0.94 0.91
1.00
0.91
0.97 0.95 -0.08
0.87 -0.41
0.03
CPI
0.94
0.99 0.99
0.91
1.00
0.96 0.99 -0.05
0.87 -0.64
0.01
EX
0.98
0.98 0.97
0.97
0.96
1.00 0.98 -0.05
0.89 -0.53
0.04
0.97
1.00 0.99
0.95
0.99
0.98 1.00 -0.03
0.87 -0.62
0.00
GDP
CHGINV
NAFTA
GOLD
RECES
EX
CPI
1.00
YD
IM
IM
DJ
POP
Table 27. Correlation Matrix for all variables
-0.02 -0.05 -0.04 -0.08 -0.05 -0.05 -0.03
0.85
0.86 0.86
0.87
0.87
0.89 0.87
1.00
0.03 -0.01 -0.50
0.03
1.00 -0.42 -0.05
-0.49 -0.64 -0.68 -0.41 -0.64 -0.53 -0.62 -0.01 -0.42
0.01
0.02 0.01
0.03
0.01
1.00 -0.02
0.04 0.00 -0.50 -0.05 -0.02
1.00
Source: Authors’ own table based on calculations conducted with EViews software.
From the above-presented correlation table it is clear that all but three (change
in inventory, presence of the Gold Standard and the presence of recession) of
the independent variables are highly, positively and statistically significantly
correlated with the dependent variable. Unfortunately, when looking at
correlation coefficients between independent variables themselves, there exists
a high probability of multicollinearity. As a result, some of the variables that are
derived from other variables (for example, export and gross domestic product)
should be paid attention to as only one of them, theoretically the one that
has the highest correlation coefficient with the dependent variable, should be
included in the model. Additionally, R-squared of the model and p-values of
coefficients of included explanatory variables will be monitored for the signs of
multicollinearity.
71
Example
Unit Route Test
As the literature shows that the most important explanatory variables are
disposable income (the more money people have, the more they will buy)
and population (the higher the number of customers, the higher the number
of purchases), the first test for stationarity only for those and the dependent
variable will be carried out. Hypotheses statements for the unit route test for
each of the three variables are shown in Table 28.
Table 28. Hypothesis statements for the Unit Route tests
Variable
IM
Null Hypothesis
H0: the variable is nonstationary
Alternative Hypothesis
H1: the variable is stationary
YD
H0: the variable is nonstationary
H1: the variable is stationary
POP
H0: the variable is nonstationary
H1: the variable is stationary
Source: Authors’ own table.
The results of the Augmented Dickey-Fuller tests1 are shown in Table 29.
None of the original data was stationary on its levels. Taking the first difference
solved the problem for the U.S. imports and disposable income, but the variable
representing the U.S. population had to be differenced twice in order to achieve
stationarity.
Table 29. Results of the Augmented Dickey-Fuller test for the presence of
a Unit Route
IM
Test critical
values:
Augmented
Dickey-Fuller
test statistic
D(IM)
t-StatisProb. Augmented
tic
Dickey-Fuller test
0.493 0.986 statistic
Augmented
Dickey-Fuller
test statistic
1%
level
5%
level
10%
level
YD
-3.463
-2.876
-2.575
1%
level
5%
Test critical values:
level
10%
level
D(YD)
t-StatisProb. Augmented
tic
Dickey-Fuller test
3.455 1.000 statistic
t-Statistic
Prob.
-7.628
0.000
-3.463
-2.876
-2.575
t-Statistic
Prob.
-8.902
0.000
1
There are other tests for the presence of the unit route, but the Augmented Dickey-Fuller test
is administered as it does not suffer from the problem of subjectivism like other tests, for example,
the analysis of the graph.
72
Example
1%
level
Test critical
values:
5%
level
10%
level
POP
-2.876
1%
level
5%
level
10%
level
5%
-2.876
level
10%
-2.575
level
D(POP, 2)
-2.575
-3.465
-2.877
-2.575
-3.464
Test critical values:
t-StatisProb. Augmented
tic
Dickey-Fuller test
1.776 1.000 statistic
Augmented
Dickey-Fuller
test statistic
Test critical
values:
1%
level
-3.463
1%
level
5%
Test critical values:
level
10%
level
t-Statistic
Prob.
-4.427
0.000
-3.465
-2.877
-2.575
Source: Authors’ own table based on calculations conducted with EViews software.
Model Estimation
As mentioned earlier, thankfully, the literature review has put forward
two independent variables that are the most cross-quoted in previous works;
therefore, allowing for the construction of the restricted structural equation
shown in Equation 39.2
Equation 39. Restricted structural equation
D(IMt ) = β0 + β1D(YD)+ β2D (POP, 2) + εt
Source: Authors’ own equation.
Now that the restricted equation is properly specified, the estimation
procedure can begin. The Ordinary Least Squares method of estimation of
the parameters of the model is employed. The results of the estimation are
presented in Table 30 and model’s statistics are shown in Table 31, and the
resulting structural model is shown in Equation 40.
Equation 40. Restricted structural model
D(IMt ) = 5.073 + 0.142D(YD)+ 0.008D (POP, 2)
Source: Authors’ own equation based on calculations conducted with EViews software.
2
Note that if the final model can be constructed based on the literature review, it is the preferred way to proceed.
73
Example
Table 30. Values of the restricted model’s parameters
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
5.073
1.741
2.913
0.004
D(YD)
0.142
0.027
5.210
0.000
D(POP,2)
-0.008
0.015
-0.513
0.609
Source: Authors’ own table based on calculations conducted with EViews software.
Table 31. Values of the restricted model’s statistics
R-squared
0.130
Mean dependent var.
11.023
Adjusted R-squared
0.120
S.D. dependent var.
19.094
17.908
Akaike info criterion
8.624
Schwarz criterion
8.676
Hannan-Quinn criter.
8.645
Durbin-Watson stat.
1.168
S.E. of regression
Sum squared resid.
Log likelihood
F-statistic
Prob. (F-statistic)
58686.390
-799.065
13.662
0.000
Source: Authors’ own table based on calculations conducted with EViews software.
Examining model’s statistics first, it can be said that the model as a whole is
statistically significant – Prob. (F-statistic) = 0.000 – but it is a poor model as it
only explains 13% of the variation in the dependent variable according to the
R-squared statistic and even less, 12%, according to the Adjusted R-squared
statistic. In addition, the model suffers from the presence of autocorrelation
that is suggested by the Durbin-Watson statistic (1.168) and confirmed by
the Breusch-Godfrey Serial Correlation Lagrange Multiplier test with the null
hypothesis of no autocorrelation that is rejected due to p-value = 0.000
(Table 32).
Table 32. Results of the Breusch-Godfrey Serial Correlation Lagrange
Multiplier test
Breusch-Godfrey Serial Correlation LM Test
F-statistic
23.221
Prob. F (2,181)
0.000
Obs*R-squared
37.980
Prob. Chi-Square (2)
0.000
Source: Authors’ own table based on calculations conducted with EViews software.
As for the coefficients of the independent variables forced in based on the
literature (Table 30), only the one assigned to disposable income is statistically
significant (p-value = 0.000) and its sign in line with the set hypothesis (0.142).
The coefficient of population is found to be highly statistically insignificant
74
Example
(p-value = 0.609) given the 5% level of confidence and its sign is opposite of
what is expected (-0.008).3
Obviously, the model needs to be improved on. To do so, first the auxiliary
regression (Equation 41) is estimated with the residuals from Equation 40 as
the dependent variable and all possible explanatory variables from Table 25 as
independent factors.
Equation 41. Structural auxiliary equation
εa = α0 + α1YD + α2POP + α3DJ + α4CPI + α5EX + α6GDP + α7CHG.INV
+ α8NAFTA + α9GOLD + α10RECES + γa
Source: Authors’ own equation.
The results of the equation (Table 33) are as expected when looking at p-values
of already included independent variables (p-value of independent income is
very high, 0.7909 and the p-value for population is low, which is expected given
its lack its of statistical significance in the restricted model).
Table 33. Results of the auxiliary regression (1)
Variable
C
Coefficient
-195.817
Std. Error
t-Statistic
107.739
-1.818
Prob.
0.071
YD
-0.005
0.017
-0.266
0.791
POP
0.001
0.001
1.907
0.058
DJ
0.005
0.002
2.088
0.038
CPI
0.154
0.175
0.878
0.381
EX
-0.014
0.034
-0.411
0.681
GDP
-0.016
0.015
-1.041
0.299
0.192
0.043
4.492
0.000
CHGINV
NAFTA
-0.756
7.061
-0.107
0.915
GOLD
1.819
6.596
0.276
0.783
RECES
-10.658
3.416
-3.120
0.002
Source: Authors’ own table based on calculations conducted with EViews software.
Prior to adding any new explanatory variables to the restricted model,
a statistical test with the use of a Lagrange Multiplier (number of observations
times R-squared from the auxiliary model; 200 • 0.360679 = 72.1358) is carried
3
This may happen. Some variables often work for some test subjects, in this case countries,
and some, even the ones most often used by the literature, may be found to highly statistically insignificant. Still, as both of the used explanatory factors are the ones that are used in the literature
the most, they will stay in the model. At the same time, if none or significant most of the staple
independent variables work, it is wise to use other ones.
75
Example
out with the null hypothesis of no more information to be extracted (H0: αk+1
= αk+2 = … = αk+m = 0) and the alternative that some more information can
be added (H1: αk+i ≠ 0; least for some i). Since at a 5% level of confidence and
10 – 2 degrees of freedom χ2critical (15.50731) is less than χ2observed (72.1358), the
null hypothesis is rejected and a statement can be made that there is still some
information that can be added, extracted.
From the output presented above in Table 33, the obvious choice for addition
to the unrestricted model is the variable representing changes in inventory
(p-value = 0.000) and the presence of recession (p-value = 0.002). First, the
two variables are tested for stationarity (Table 34). Both variables prove to be
stationary, do not have unit route, in levels as critical values for both are more
than the observed and p-values are less than 0.00.
Table 34. Stationarity test for CHG.INV and RECES variables
CHG.INV
Augmented Dickey-Fuller test statistic
Test critical values:
Prob.
-7.2201
0.000
1% level
-3.4654
5% level
-2.8768
10% level
-2.5750
RECES
Augmented Dickey-Fuller test statistic
Test critical values:
t-Statistic
t-Statistic
Prob.
-5.3811
0.000
1% level
-3.4654
5% level
-2.8768
10% level
-2.5750
Source: Authors’ own table based on calculations conducted with EViews software.
After adding new selected independent variables to the model, the structural
equation takes the form shown in Equation 42 and the estimated parameters
have values shown in Table 35, with model’s statistics presented in Table 36.
Equation 42. Unrestricted structural model
D(IMt ) = β0 + β1D(YD)+ β2D(POP, 2) + β3CHGINV + β4RECES + εt
Source: Authors’ own equation.
76
Example
Table 35. Values of the unrestricted model’s parameters
Variable
C
D(YD)
D(POP,2)
CHGINV
RECES
Coefficient
Std. Error
t-Statistic
Prob.
2.105
1.456
0.147
3.064
0.094
0.024
3.844
0.000
-0.002
0.013
-0.178
0.859
0.209
0.041
5.142
0.000
-12.254
3.457
-3.544
0.001
Source: Authors’ own table based on calculations conducted with EViews software.
Table 36. Values of the unrestricted model’s statistics
R-squared
0.355
Mean dependent var
11.023
Adjusted R-squared
0.340
S.D. dependent var
19.094
15.507
Akaike info criterion
8.347
Schwarz criterion
8.434
Hannan-Quinn criter.
8.382
Durbin-Watson stat
1.413
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
43,526.250
-771.272
24.870
0.000
Source: Authors’ own table based on calculations conducted with EViews software.
As expected, both the R-squared and the Adjusted R-squared have increased;
therefore, confirming the notion that the addition of the new explanatory
variables was a good decision. The model as a whole is still statistically valid
with a higher F-statistic that has probability less than 0.00.
Another round of tests is run to see if there is still some more information to
be extracted. As it turns out, for 10–4 degrees of freedom at a 5% confidence
interval, χ2critical (12.59159) is less than χ2observed (200 • 0.195209 = 39.0418),
hence the output of the second auxiliary model should be examined for new
independent variables.
Still, for the purpose of this example, let us assume that the unrestricted
model based on the structural equation shown in Equation 42 is the final model
and proceed with tests.
Starting with the test of the model for multicollinearity, the correlation matrix
(Table 27) strongly suggests that it may prove to be an issue. Still, given the fact
that the independent variables in the model are statistically significant (with the
exception of population) and R-squared is not excessively high, multicollinearity
is not expected to be an issue. As for autocorrelation, the Breusch-Godfrey
Serial Correlation Lagrange Multiplier test shows that it is an issue for of the 1st
(Table 37) and the 2nd (Table 38) order.
77
Example
Table 37. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for
the final model (1)
F-statistic
Obs*R-squared
Breusch-Godfrey Serial Correlation LM Test
15.49961 Prob. F (2,179)
27.45655 Prob. Chi-Square (2)
0.00
0.00
Source: Authors’ own table based on calculations conducted with EViews software.
Table 38. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for
the final model (2)
F-statistic
Obs*R-squared
Breusch-Godfrey Serial Correlation LM Test
6.178035 Prob. F (2,177)
12.07182 Prob. Chi-Square (2)
0.0025
0.0024
Source: Authors’ own table based on calculations conducted with EViews software.
Because there are few independent variables in the model, using the lags of
the independent variable is not advised since the ratio of the prior to the latter
would be 2:1. A solution to this issue is an inclusion of AR(p) terms. After AR(1)
was introduced, the Breusch-Godfrey Serial Correlation Lagrange Multiplier test
still shows that autocorrelation is an issue (Table 38). Therefore, the second
term, AR(2), was added; test’s results shown in Table 39 suggest failing to reject
the null of no autocorrelation eventually yielding the output of the final model
shown in Table 40 and its statistics in Table 41. This result is supported by the
Durbin-Watson statistic (1.942) being very close to its ideal value, 2.00.
Table 39. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for
the final model (3)
F-statistic
Obs*R-squared
Breusch-Godfrey Serial Correlation LM Test
0.867717
Prob. F (2,175)
1.806768
Prob. Chi-Square (2)
0.4217
0.4052
Source: Authors’ own table based on calculations conducted with EViews software.
Table 40. Values of the corrected unrestricted mode’s parameters
Variable
C
D(YD)
D(POP,2)
CHGINV
RECES
AR(1)
AR(2)
Coefficient
6.478
0.055
-0.001
0.156
-15.311
0.262
0.243
Std. Error
2.810
0.022
0.010
0.045
3.981
0.077
0.075
t-Statistic
2.305
2.558
-0.150
3.485
-3.846
3.401
3.240
Prob.
0.022
0.011
0.881
0.001
0.000
0.001
0.001
Source: Authors’ own table based on calculations conducted with EViews software.
78
Example
Table 41. Values of the corrected unrestricted model’s statistics
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.448
0.429
14.457
36991.520
-749.007
23.903
0.000
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
11.191
19.129
8.217
8.340
8.267
1.942
Source: Authors’ own table based on calculations conducted with EViews software.
Another test is to see if the residuals have a normal distribution.4 This is
done by examining the Jarque-Bera statistic (5.569) with the null hypothesis
of normal distribution. Since the p-value associated with the statistic equals
0.059051, and is just above the p-value at a 5% level of confidence (0.05), it is
possible to say that, given the level of significance, the residuals are normally
distributed.
The last test is the White test for heteroscadesticity with the null hypothesis
of no heteroscadesticity. Since the p-value of the test (0.0862, shown in Table 42)
is above the 5% cut off point, a conclusion can be said that the model does not
suffer from heteroscadesticity.
Table 42. White heteroscadesticity test for the final model
F-statistic
Obs*R-squared
Scaled explained SS
Heteroscadesticity Test: White
1.444
Prob. F (27,156)
36.788
Prob. Chi-Square (27)
44.504
Prob. Chi-Square (27)
0.086
0.099
0.018
Source: Authors’ own table based on calculations conducted with EViews software.
Moving to the model’s assessment, the estimated model explains 44.8%
of variation in the dependent variable (R-squared = 0.448). In its entirety, the
model is statistically significant – Prob. (F-statistic) = 0.000.
As for the coefficients, all but the one assigned to population (p-value =
0.811) are statistically significant at a 5% level of significance (this includes both
autoregressive terms) – the highest p-value, 0.011, is associated with disposable
income. The interpretation of statistically significant coefficients is as follows:
1) YD: If the difference in real disposable income increases by one billion (of
Chained U.S. 2005) USD, the difference5 in the U.S. imports will increase
by 0.055 billion (of Chained U.S. 2005) USD, or 55,491,000 Chained U.S.
2005 USD,
4
5
Remembering that, as presented earlier, this is an ideal assumption.
Remember that for stationarity reasons, the dependent variable had to be differenced.
79
Example
2) CHGINV: If the real change in private inventories increases by one billion
(of Chained U.S. 2005) USD, the difference in the U.S. imports will increase
by 0.156 billion (of Chained U.S. 2005) USD, or 156,196,000 Chained U.S.
2005 USD,
3) RECES: If the U.S. is in a recession, the difference in the U.S. imports will
decrease by 15.311 billion (of Chained U.S. 2005) USD, or 15,311,240,000
Chained U.S. 2005 USD.
All of the hypothesis statements regarding the signs of incorporated
independent variables (as listed in Table 26) have been statistically confirmed at
a 5% level of significance.6
Additionally, let us examine the graph (Graph 7) that shows how the fitted
data looks when set against the actual data with incorporated residuals.
Graph 7. Actual, fitted data and residuals of the final model
Source: Authors’ own graph based on calculations conducted with EViews software.
The fitted data is still off the actual data, which is seen as the discrepancy
between the two series and high jumps in the values of the residuals.7
Lastly, the model is tested ex post over the data from the first quarter of the
year 2007 to the fourth quarter of the year 2009. This is done in two ways. First,
the forecast (IMF) is evaluated visually (Graph 8) by comparing it to the actual
data (IM) with the upper/lower boundary being set by adding/subtracting twice
the value of the standard error of the forecast to/from the IMF value.
6
This statement is made based on the examination of p-values of those coefficients. Of course,
t-tests can be carried out to manually prove the referred to statement, but in practice it is omitted
to avoid repetition.
7
This is expected as the low quality of the fit is suggested by the value of the R-squared statistic
and is due to the assumption that the analyzed model is the final model.
80
Example
Graph 8. Ex post forecast of the final model
Source: Authors’ own graph based on calculations conducted with EViews software.
The graph shows that till the third quarter of the year 2008, the model did
a very good job when it comes to forecasting the values of the U.S. imports.
After that, the actual data begins to significantly deviate from the forecast that
still was able to detect the incoming downward trend with a recovery at the
end. This discrepancy between the two values will be corrected for by adding
new explanatory variables (for example, the variable correcting for the presence
of the 2007 economic crisis that occurred at this time).
Looking at the forecast by examining its statistics, the key three (bias, 0.3411,
variance, 0.5972, and covariance, 0.0617) proportions,8 it is possible to say that
the bulk of the bias is associated with the fact that the variation of the forecast
is far from the variation of the actual series, followed by the fact that the mean
of the forecast is far from the mean of the actual series, with the least bias being
associated with the covariance proportion.9
8
The Root Mean Squared Error, although very important, is not evaluated here as it is used to
compare between the forecasts.
9
In a good forecast, bias and variance proportions ought to be very low with the bulk of the
bias being attributed to the covariance proportion.
81
82
IM
757.22
505.55
2208.34
108.45
647.76
0.97
2.54
32.94
0.00
200
YD
5402.41
5078.25
10095.10
1955.50
2415.92
0.43
2.03
13.94
0.00
200
POP
240896.80
237375.50
308413.30
179590.30
36955.19
0.18
1.85
12.07
0.00
200
DJ
CPI
EX
3770.61 106.52
580.11
1262.77 105.78
357.77
13379.36 218.91 1670.43
573.47 29.40
94.76
3940.41 61.10
455.34
1.01
0.20
0.78
2.41
1.66
2.29
37.21 16.29
24.47
0.00
0.00
0.00
200
200
200
GDP
CHGINV
7339.93
24.62
6708.77
25.01
13415.27
117.20
2802.62
-160.22
3202.17
37.41
0.43
-1.21
1.95
7.82
15.35
242.21
0.00
0.00
200
200
Source: Authors’ own table based on calculations conducted with EViews software.
Mean
Median
Max.
Min.
Std.Dev.
Skewness
Kurtosis
J-B
Prob.
Obs.
Table 43. Descriptive Statistics
NAFTA
GOLD
RECES
0.38
0.22
0.20
0.00
0.00
0.00
1.00
1.00
1.00
0.00
0.00
0.00
0.49
0.42
0.40
0.49
1.35
1.54
1.24
2.83
3.37
33.83
61.16
80.16
0.00
0.00
0.00
200
200
200
Example
Final Remarks
Performing econometric research is a science and writing a clear description
is an art. The purpose of this book was to guide you through bringing both
of those skills together. Be it describing variables or using the LM test to find
the presence of autocorrelation, the important thing is to understand that
conducting research is a step-by-step process. Yet, just like any other highly
structured process, even this one sometimes requires adjustments. Think, make
a plan, take notes and you will be fine.
And always remember: if the critical and p-value are low, the null has to go.
83
Statistical Tables
Statistical Tables
Statistical Tables
z-table
Area between 0 and z
0
0.01
0.02
0.03
0
0.004
0.008
0.012
0.04
0.05
0.06
0.07
0.08
0.09
0.016 0.0199 0.0239 0.0279 0.0319 0.0359
0.1
0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2
0.0793 0.0832 0.0871
0.3
0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443
0.4
0.1554 0.1591 0.1628 0.1664
0.5
0.1915
0.6
0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7
0.091 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.148 0.1517
0.17 0.1736 0.1772 0.1808 0.1844 0.1879
0.195 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157
0.219 0.2224
0.258 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8
0.2881
0.9
0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315
1
0.291 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.334 0.3365 0.3389
0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1
0.3643 0.3665 0.3686 0.3708 0.3729 0.3749
1.2
0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962
1.3
0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4
0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5
0.4332 0.4345 0.4357
1.6
0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7
0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8
0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9
2
0.377
0.379
0.381
0.383
0.398 0.3997 0.4015
0.437 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
0.4713 0.4719 0.4726 0.4732 0.4738 0.4744
0.475 0.4756 0.4761 0.4767
0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1
0.4821 0.4826
2.2
0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887
2.3
0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4
0.4918
0.492 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5
0.4938
0.494 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6
0.4953 0.4955 0.4956 0.4957 0.4959
0.496 0.4961 0.4962 0.4963 0.4964
2.7
0.4965 0.4966 0.4967 0.4968 0.4969
0.497 0.4971 0.4972 0.4973 0.4974
2.8
0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979
2.9
0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3
86
0
0.483 0.4834 0.4838 0.4842 0.4846
0.485 0.4854 0.4857
0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989
0.489
0.498 0.4981
0.499
0.499
Statistical Tables
t-table
degrees
of probability
freedom
one-tail
0.4
test
two-tail
0.8
test
1
0.32492
0.25
0.1
0.05
0.025
0.01
0.005
0.0005
0.5
0.2
0.1
0.05
0.02
0.01
0.001
1 3.077684 6.313752
12.7062 31.82052 63.65674 636.6192
2 0.288675 0.816497 1.885618 2.919986
4.30265
6.96456
9.92484
31.5991
3 0.276671 0.764892 1.637744 2.353363
3.18245
4.5407
5.84091
12.924
4 0.270722 0.740697 1.533206 2.131847
2.77645
3.74695
4.60409
8.6103
5 0.267181 0.726687 1.475884 2.015048
2.57058
3.36493
4.03214
6.8688
6 0.264835 0.717558 1.439756
1.94318
2.44691
3.14267
3.70743
5.9588
7 0.263167 0.711142 1.414924 1.894579
2.36462
2.99795
3.49948
5.4079
8 0.261921 0.706387 1.396815 1.859548
2.306
2.89646
3.35539
5.0413
9 0.260955 0.702722 1.383029 1.833113
2.26216
2.82144
3.24984
4.7809
10 0.260185 0.699812 1.372184 1.812461
2.22814
2.76377
3.16927
4.5869
11 0.259556 0.697445
1.36343 1.795885
2.20099
2.71808
3.10581
4.437
12 0.259033 0.695483 1.356217 1.782288
2.17881
2.681
3.05454
4.3178
13 0.258591 0.693829 1.350171 1.770933
14 0.258213 0.692417
2.16037
2.65031
3.01228
4.2208
1.34503
1.76131
2.14479
2.62449
2.97684
4.1405
15 0.257885 0.691197 1.340606
1.75305
2.13145
2.60248
2.94671
4.0728
16 0.257599 0.690132 1.336757 1.745884
2.11991
2.58349
2.92078
4.015
17 0.257347 0.689195 1.333379 1.739607
2.10982
2.56693
2.89823
3.9651
18 0.257123 0.688364 1.330391 1.734064
2.10092
2.55238
2.87844
3.9216
19 0.256923 0.687621 1.327728 1.729133
2.09302
2.53948
2.86093
3.8834
20 0.256743 0.686954 1.325341 1.724718
2.08596
2.52798
2.84534
3.8495
21
0.25658 0.686352 1.323188 1.720743
2.07961
2.51765
2.83136
3.8193
22 0.256432 0.685805 1.321237 1.717144
2.07387
2.50832
2.81876
3.7921
23 0.256297 0.685306
1.31946 1.713872
2.06866
2.49987
2.80734
3.7676
24 0.256173
0.68485 1.317836 1.710882
2.0639
2.49216
2.79694
3.7454
25
0.68443 1.316345 1.708141
2.05954
2.48511
2.78744
3.7251
26 0.255955 0.684043 1.314972 1.705618
2.05553
2.47863
2.77871
3.7066
27 0.255858 0.683685 1.313703 1.703288
2.05183
2.47266
2.77068
3.6896
28 0.255768 0.683353 1.312527 1.701131
2.04841
2.46714
2.76326
3.6739
29 0.255684 0.683044 1.311434 1.699127
2.04523
2.46202
2.75639
3.6594
30 0.255605 0.682756 1.310415 1.697261
2.04227
2.45726
2.75
3.646
inf 0.253347
1.95996
2.32635
2.57583
3.2905
0.25606
0.67449 1.281552 1.644854
87
Statistical Tables
F-table at 0.01 level of significance
df in numerator
df in
denominator
1
2
3
4
5
6
1
4052.181
4999.5
5403.352
5624.583
5763.65
5858.986
2
98.503
99
99.166
99.249
99.299
99.333
3
34.116
30.817
29.457
28.71
28.237
27.911
4
21.198
18
16.694
15.977
15.522
15.207
5
16.258
13.274
12.06
11.392
10.967
10.672
6
13.745
10.925
9.78
9.148
8.746
8.466
7
12.246
9.547
8.451
7.847
7.46
7.191
8
11.259
8.649
7.591
7.006
6.632
6.371
9
10.561
8.022
6.992
6.422
6.057
5.802
10
10.044
7.559
6.552
5.994
5.636
5.386
11
9.646
7.206
6.217
5.668
5.316
5.069
12
9.33
6.927
5.953
5.412
5.064
4.821
13
9.074
6.701
5.739
5.205
4.862
4.62
14
8.862
6.515
5.564
5.035
4.695
4.456
15
8.683
6.359
5.417
4.893
4.556
4.318
16
8.531
6.226
5.292
4.773
4.437
4.202
17
8.4
6.112
5.185
4.669
4.336
4.102
18
8.285
6.013
5.092
4.579
4.248
4.015
19
8.185
5.926
5.01
4.5
4.171
3.939
20
8.096
5.849
4.938
4.431
4.103
3.871
21
8.017
5.78
4.874
4.369
4.042
3.812
22
7.945
5.719
4.817
4.313
3.988
3.758
23
7.881
5.664
4.765
4.264
3.939
3.71
24
7.823
5.614
4.718
4.218
3.895
3.667
25
7.77
5.568
4.675
4.177
3.855
3.627
26
7.721
5.526
4.637
4.14
3.818
3.591
27
7.677
5.488
4.601
4.106
3.785
3.558
28
7.636
5.453
4.568
4.074
3.754
3.528
29
7.598
5.42
4.538
4.045
3.725
3.499
30
7.562
5.39
4.51
4.018
3.699
3.473
40
7.314
5.179
4.313
3.828
3.514
3.291
60
7.077
4.977
4.126
3.649
3.339
3.119
120
6.851
4.787
3.949
3.48
3.174
2.956
inf
6.635
4.605
3.782
3.319
3.017
2.802
88
Statistical Tables
df in numerator
df in
denominator
7
8
9
10
12
15
1
5928.356
5981.07
6022.473
6055.847
6106.321
6157.285
2
99.356
99.374
99.388
99.399
99.416
99.433
3
27.672
27.489
27.345
27.229
27.052
26.872
4
14.976
14.799
14.659
14.546
14.374
14.198
5
10.456
10.289
10.158
10.051
9.888
9.722
6
8.26
8.102
7.976
7.874
7.718
7.559
7
6.993
6.84
6.719
6.62
6.469
6.314
8
6.178
6.029
5.911
5.814
5.667
5.515
9
5.613
5.467
5.351
5.257
5.111
4.962
10
5.2
5.057
4.942
4.849
4.706
4.558
11
4.886
4.744
4.632
4.539
4.397
4.251
12
4.64
4.499
4.388
4.296
4.155
4.01
13
4.441
4.302
4.191
4.1
3.96
3.815
14
4.278
4.14
4.03
3.939
3.8
3.656
15
4.142
4.004
3.895
3.805
3.666
3.522
16
4.026
3.89
3.78
3.691
3.553
3.409
17
3.927
3.791
3.682
3.593
3.455
3.312
18
3.841
3.705
3.597
3.508
3.371
3.227
19
3.765
3.631
3.523
3.434
3.297
3.153
20
3.699
3.564
3.457
3.368
3.231
3.088
21
3.64
3.506
3.398
3.31
3.173
3.03
22
3.587
3.453
3.346
3.258
3.121
2.978
23
3.539
3.406
3.299
3.211
3.074
2.931
24
3.496
3.363
3.256
3.168
3.032
2.889
25
3.457
3.324
3.217
3.129
2.993
2.85
26
3.421
3.288
3.182
3.094
2.958
2.815
27
3.388
3.256
3.149
3.062
2.926
2.783
28
3.358
3.226
3.12
3.032
2.896
2.753
29
3.33
3.198
3.092
3.005
2.868
2.726
30
3.304
3.173
3.067
2.979
2.843
2.7
40
3.124
2.993
2.888
2.801
2.665
2.522
60
2.953
2.823
2.718
2.632
2.496
2.352
120
2.792
2.663
2.559
2.472
2.336
2.192
inf
2.639
2.511
2.407
2.321
2.185
2.039
89
Statistical Tables
df in numerator
df in
denominator
20
30
40
60
120
INF
1
6208.73
6260.649
6286.782
6313.03
6339.391
6365.864
2
99.449
99.466
99.474
99.482
99.491
99.499
3
26.69
26.505
26.411
26.316
26.221
26.125
4
14.02
13.838
13.745
13.652
13.558
13.463
5
9.553
9.379
9.291
9.202
9.112
9.02
6
7.396
7.229
7.143
7.057
6.969
6.88
7
6.155
5.992
5.908
5.824
5.737
5.65
8
5.359
5.198
5.116
5.032
4.946
4.859
90
9
4.808
4.649
4.567
4.483
4.398
4.311
10
4.405
4.247
4.165
4.082
3.996
3.909
11
4.099
3.941
3.86
3.776
3.69
3.602
12
3.858
3.701
3.619
3.535
3.449
3.361
13
3.665
3.507
3.425
3.341
3.255
3.165
14
3.505
3.348
3.266
3.181
3.094
3.004
15
3.372
3.214
3.132
3.047
2.959
2.868
16
3.259
3.101
3.018
2.933
2.845
2.753
17
3.162
3.003
2.92
2.835
2.746
2.653
18
3.077
2.919
2.835
2.749
2.66
2.566
19
3.003
2.844
2.761
2.674
2.584
2.489
20
2.938
2.778
2.695
2.608
2.517
2.421
21
2.88
2.72
2.636
2.548
2.457
2.36
22
2.827
2.667
2.583
2.495
2.403
2.305
23
2.781
2.62
2.535
2.447
2.354
2.256
24
2.738
2.577
2.492
2.403
2.31
2.211
25
2.699
2.538
2.453
2.364
2.27
2.169
26
2.664
2.503
2.417
2.327
2.233
2.131
27
2.632
2.47
2.384
2.294
2.198
2.097
28
2.602
2.44
2.354
2.263
2.167
2.064
29
2.574
2.412
2.325
2.234
2.138
2.034
30
2.549
2.386
2.299
2.208
2.111
2.006
40
2.369
2.203
2.114
2.019
1.917
1.805
60
2.198
2.028
1.936
1.836
1.726
1.601
120
2.035
1.86
1.763
1.656
1.533
1.381
inf
1.878
1.696
1.592
1.473
1.325
1
Statistical Tables
F-table at 0.025 level of significance
df in numerator
df in
denominator
1
2
3
4
5
6
1
647.789
799.5
864.163
899.5833
921.8479
937.1111
2
38.5063
39
39.1655
39.2484
39.2982
39.3315
3
17.4434
16.0441
15.4392
15.101
14.8848
14.7347
4
12.2179
10.6491
9.9792
9.6045
9.3645
9.1973
5
10.007
8.4336
7.7636
7.3879
7.1464
6.9777
6
8.8131
7.2599
6.5988
6.2272
5.9876
5.8198
7
8.0727
6.5415
5.8898
5.5226
5.2852
5.1186
8
7.5709
6.0595
5.416
5.0526
4.8173
4.6517
9
7.2093
5.7147
5.0781
4.7181
4.4844
4.3197
10
6.9367
5.4564
4.8256
4.4683
4.2361
4.0721
11
6.7241
5.2559
4.63
4.2751
4.044
3.8807
12
6.5538
5.0959
4.4742
4.1212
3.8911
3.7283
13
6.4143
4.9653
4.3472
3.9959
3.7667
3.6043
14
6.2979
4.8567
4.2417
3.8919
3.6634
3.5014
15
6.1995
4.765
4.1528
3.8043
3.5764
3.4147
16
6.1151
4.6867
4.0768
3.7294
3.5021
3.3406
17
6.042
4.6189
4.0112
3.6648
3.4379
3.2767
18
5.9781
4.5597
3.9539
3.6083
3.382
3.2209
19
5.9216
4.5075
3.9034
3.5587
3.3327
3.1718
20
5.8715
4.4613
3.8587
3.5147
3.2891
3.1283
21
5.8266
4.4199
3.8188
3.4754
3.2501
3.0895
22
5.7863
4.3828
3.7829
3.4401
3.2151
3.0546
23
5.7498
4.3492
3.7505
3.4083
3.1835
3.0232
24
5.7166
4.3187
3.7211
3.3794
3.1548
2.9946
25
5.6864
4.2909
3.6943
3.353
3.1287
2.9685
26
5.6586
4.2655
3.6697
3.3289
3.1048
2.9447
27
5.6331
4.2421
3.6472
3.3067
3.0828
2.9228
28
5.6096
4.2205
3.6264
3.2863
3.0626
2.9027
29
5.5878
4.2006
3.6072
3.2674
3.0438
2.884
30
5.5675
4.1821
3.5894
3.2499
3.0265
2.8667
40
5.4239
4.051
3.4633
3.1261
2.9037
2.7444
60
5.2856
3.9253
3.3425
3.0077
2.7863
2.6274
120
5.1523
3.8046
3.2269
2.8943
2.674
2.5154
inf
5.0239
3.6889
3.1161
2.7858
2.5665
2.4082
91
Statistical Tables
df in numerator
df in
denominator
7
8
9
10
12
15
1
948.2169
956.6562
963.2846
968.6274
976.7079
984.8668
2
39.3552
39.373
39.3869
39.398
39.4146
39.4313
3
14.6244
14.5399
14.4731
14.4189
14.3366
14.2527
4
9.0741
8.9796
8.9047
8.8439
8.7512
8.6565
5
6.8531
6.7572
6.6811
6.6192
6.5245
6.4277
6
5.6955
5.5996
5.5234
5.4613
5.3662
5.2687
7
4.9949
4.8993
4.8232
4.7611
4.6658
4.5678
8
4.5286
4.4333
4.3572
4.2951
4.1997
4.1012
92
9
4.197
4.102
4.026
3.9639
3.8682
3.7694
10
3.9498
3.8549
3.779
3.7168
3.6209
3.5217
11
3.7586
3.6638
3.5879
3.5257
3.4296
3.3299
12
3.6065
3.5118
3.4358
3.3736
3.2773
3.1772
13
3.4827
3.388
3.312
3.2497
3.1532
3.0527
14
3.3799
3.2853
3.2093
3.1469
3.0502
2.9493
15
3.2934
3.1987
3.1227
3.0602
2.9633
2.8621
16
3.2194
3.1248
3.0488
2.9862
2.889
2.7875
17
3.1556
3.061
2.9849
2.9222
2.8249
2.723
18
3.0999
3.0053
2.9291
2.8664
2.7689
2.6667
19
3.0509
2.9563
2.8801
2.8172
2.7196
2.6171
20
3.0074
2.9128
2.8365
2.7737
2.6758
2.5731
21
2.9686
2.874
2.7977
2.7348
2.6368
2.5338
22
2.9338
2.8392
2.7628
2.6998
2.6017
2.4984
23
2.9023
2.8077
2.7313
2.6682
2.5699
2.4665
24
2.8738
2.7791
2.7027
2.6396
2.5411
2.4374
25
2.8478
2.7531
2.6766
2.6135
2.5149
2.411
26
2.824
2.7293
2.6528
2.5896
2.4908
2.3867
27
2.8021
2.7074
2.6309
2.5676
2.4688
2.3644
28
2.782
2.6872
2.6106
2.5473
2.4484
2.3438
29
2.7633
2.6686
2.5919
2.5286
2.4295
2.3248
30
2.746
2.6513
2.5746
2.5112
2.412
2.3072
40
2.6238
2.5289
2.4519
2.3882
2.2882
2.1819
2.0613
60
2.5068
2.4117
2.3344
2.2702
2.1692
120
2.3948
2.2994
2.2217
2.157
2.0548
1.945
inf
2.2875
2.1918
2.1136
2.0483
1.9447
1.8326
Statistical Tables
df in numerator
df in
denominator
20
30
40
1
993.1028
1001.414
2
39.4479
39.465
3
14.1674
4
5
60
120
INF
1005.598
1009.8
1014.02
1018.258
39.473
39.481
39.49
39.498
14.081
14.037
13.992
13.947
13.902
8.5599
8.461
8.411
8.36
8.309
8.257
6.3286
6.227
6.175
6.123
6.069
6.015
6
5.1684
5.065
5.012
4.959
4.904
4.849
7
4.4667
4.362
4.309
4.254
4.199
4.142
8
3.9995
3.894
3.84
3.784
3.728
3.67
9
3.6669
3.56
3.505
3.449
3.392
3.333
10
3.4185
3.311
3.255
3.198
3.14
3.08
11
3.2261
3.118
3.061
3.004
2.944
2.883
12
3.0728
2.963
2.906
2.848
2.787
2.725
13
2.9477
2.837
2.78
2.72
2.659
2.595
14
2.8437
2.732
2.674
2.614
2.552
2.487
15
2.7559
2.644
2.585
2.524
2.461
2.395
16
2.6808
2.568
2.509
2.447
2.383
2.316
17
2.6158
2.502
2.442
2.38
2.315
2.247
18
2.559
2.445
2.384
2.321
2.256
2.187
19
2.5089
2.394
2.333
2.27
2.203
2.133
20
2.4645
2.349
2.287
2.223
2.156
2.085
21
2.4247
2.308
2.246
2.182
2.114
2.042
22
2.389
2.272
2.21
2.145
2.076
2.003
23
2.3567
2.239
2.176
2.111
2.041
1.968
24
2.3273
2.209
2.146
2.08
2.01
1.935
25
2.3005
2.182
2.118
2.052
1.981
1.906
26
2.2759
2.157
2.093
2.026
1.954
1.878
27
2.2533
2.133
2.069
2.002
1.93
1.853
28
2.2324
2.112
2.048
1.98
1.907
1.829
29
2.2131
2.092
2.028
1.959
1.886
1.807
30
2.1952
2.074
2.009
1.94
1.866
1.787
40
2.0677
1.943
1.875
1.803
1.724
1.637
60
1.9445
1.815
1.744
1.667
1.581
1.482
120
1.8249
1.69
1.614
1.53
1.433
1.31
inf
1.7085
1.566
1.484
1.388
1.268
1
93
Statistical Tables
F-table at 0.05 level of significance
df in numerator
df in
denominator
1
2
3
4
5
6
1
161.4476
199.5
215.7073
224.5832
230.1619
233.986
2
18.5128
19
19.1643
19.2468
19.2964
19.3295
3
10.128
9.5521
9.2766
9.1172
9.0135
8.9406
4
7.7086
6.9443
6.5914
6.3882
6.2561
6.1631
5
6.6079
5.7861
5.4095
5.1922
5.0503
4.9503
6
5.9874
5.1433
4.7571
4.5337
4.3874
4.2839
7
5.5914
4.7374
4.3468
4.1203
3.9715
3.866
8
5.3177
4.459
4.0662
3.8379
3.6875
3.5806
9
5.1174
4.2565
3.8625
3.6331
3.4817
3.3738
10
4.9646
4.1028
3.7083
3.478
3.3258
3.2172
11
4.8443
3.9823
3.5874
3.3567
3.2039
3.0946
12
4.7472
3.8853
3.4903
3.2592
3.1059
2.9961
13
4.6672
3.8056
3.4105
3.1791
3.0254
2.9153
14
4.6001
3.7389
3.3439
3.1122
2.9582
2.8477
15
4.5431
3.6823
3.2874
3.0556
2.9013
2.7905
16
4.494
3.6337
3.2389
3.0069
2.8524
2.7413
17
4.4513
3.5915
3.1968
2.9647
2.81
2.6987
18
4.4139
3.5546
3.1599
2.9277
2.7729
2.6613
19
4.3807
3.5219
3.1274
2.8951
2.7401
2.6283
20
4.3512
3.4928
3.0984
2.8661
2.7109
2.599
21
4.3248
3.4668
3.0725
2.8401
2.6848
2.5727
22
4.3009
3.4434
3.0491
2.8167
2.6613
2.5491
23
4.2793
3.4221
3.028
2.7955
2.64
2.5277
24
4.2597
3.4028
3.0088
2.7763
2.6207
2.5082
25
4.2417
3.3852
2.9912
2.7587
2.603
2.4904
26
4.2252
3.369
2.9752
2.7426
2.5868
2.4741
27
4.21
3.3541
2.9604
2.7278
2.5719
2.4591
28
4.196
3.3404
2.9467
2.7141
2.5581
2.4453
29
4.183
3.3277
2.934
2.7014
2.5454
2.4324
30
4.1709
3.3158
2.9223
2.6896
2.5336
2.4205
40
4.0847
3.2317
2.8387
2.606
2.4495
2.3359
2.2541
94
60
4.0012
3.1504
2.7581
2.5252
2.3683
120
3.9201
3.0718
2.6802
2.4472
2.2899
2.175
inf
3.8415
2.9957
2.6049
2.3719
2.2141
2.0986
Statistical Tables
df in numerator
df in
denominator
7
8
9
10
12
15
1
236.7684
238.8827
240.5433
241.8817
243.906
245.9499
2
19.3532
19.371
19.3848
19.3959
19.4125
19.4291
3
8.8867
8.8452
8.8123
8.7855
8.7446
8.7029
4
6.0942
6.041
5.9988
5.9644
5.9117
5.8578
5
4.8759
4.8183
4.7725
4.7351
4.6777
4.6188
6
4.2067
4.1468
4.099
4.06
3.9999
3.9381
7
3.787
3.7257
3.6767
3.6365
3.5747
3.5107
8
3.5005
3.4381
3.3881
3.3472
3.2839
3.2184
9
3.2927
3.2296
3.1789
3.1373
3.0729
3.0061
10
3.1355
3.0717
3.0204
2.9782
2.913
2.845
11
3.0123
2.948
2.8962
2.8536
2.7876
2.7186
12
2.9134
2.8486
2.7964
2.7534
2.6866
2.6169
13
2.8321
2.7669
2.7144
2.671
2.6037
2.5331
14
2.7642
2.6987
2.6458
2.6022
2.5342
2.463
15
2.7066
2.6408
2.5876
2.5437
2.4753
2.4034
16
2.6572
2.5911
2.5377
2.4935
2.4247
2.3522
17
2.6143
2.548
2.4943
2.4499
2.3807
2.3077
18
2.5767
2.5102
2.4563
2.4117
2.3421
2.2686
19
2.5435
2.4768
2.4227
2.3779
2.308
2.2341
20
2.514
2.4471
2.3928
2.3479
2.2776
2.2033
21
2.4876
2.4205
2.366
2.321
2.2504
2.1757
22
2.4638
2.3965
2.3419
2.2967
2.2258
2.1508
23
2.4422
2.3748
2.3201
2.2747
2.2036
2.1282
24
2.4226
2.3551
2.3002
2.2547
2.1834
2.1077
25
2.4047
2.3371
2.2821
2.2365
2.1649
2.0889
26
2.3883
2.3205
2.2655
2.2197
2.1479
2.0716
27
2.3732
2.3053
2.2501
2.2043
2.1323
2.0558
28
2.3593
2.2913
2.236
2.19
2.1179
2.0411
29
2.3463
2.2783
2.2229
2.1768
2.1045
2.0275
30
2.3343
2.2662
2.2107
2.1646
2.0921
2.0148
40
2.249
2.1802
2.124
2.0772
2.0035
1.9245
60
2.1665
2.097
2.0401
1.9926
1.9174
1.8364
120
2.0868
2.0164
1.9588
1.9105
1.8337
1.7505
inf
2.0096
1.9384
1.8799
1.8307
1.7522
1.6664
95
Statistical Tables
df in numerator
df in
denominator
20
30
40
60
120
INF
1
248.0131
250.0951
251.1432
252.1957
253.2529
254.3144
2
19.4458
19.4624
19.4707
19.4791
19.4874
19.4957
3
8.6602
8.6166
8.5944
8.572
8.5494
8.5264
4
5.8025
5.7459
5.717
5.6877
5.6581
5.6281
5
4.5581
4.4957
4.4638
4.4314
4.3985
4.365
6
3.8742
3.8082
3.7743
3.7398
3.7047
3.6689
7
3.4445
3.3758
3.3404
3.3043
3.2674
3.2298
8
3.1503
3.0794
3.0428
3.0053
2.9669
2.9276
96
9
2.9365
2.8637
2.8259
2.7872
2.7475
2.7067
10
2.774
2.6996
2.6609
2.6211
2.5801
2.5379
11
2.6464
2.5705
2.5309
2.4901
2.448
2.4045
12
2.5436
2.4663
2.4259
2.3842
2.341
2.2962
13
2.4589
2.3803
2.3392
2.2966
2.2524
2.2064
14
2.3879
2.3082
2.2664
2.2229
2.1778
2.1307
15
2.3275
2.2468
2.2043
2.1601
2.1141
2.0658
16
2.2756
2.1938
2.1507
2.1058
2.0589
2.0096
17
2.2304
2.1477
2.104
2.0584
2.0107
1.9604
18
2.1906
2.1071
2.0629
2.0166
1.9681
1.9168
19
2.1555
2.0712
2.0264
1.9795
1.9302
1.878
20
2.1242
2.0391
1.9938
1.9464
1.8963
1.8432
21
2.096
2.0102
1.9645
1.9165
1.8657
1.8117
22
2.0707
1.9842
1.938
1.8894
1.838
1.7831
23
2.0476
1.9605
1.9139
1.8648
1.8128
1.757
24
2.0267
1.939
1.892
1.8424
1.7896
1.733
25
2.0075
1.9192
1.8718
1.8217
1.7684
1.711
26
1.9898
1.901
1.8533
1.8027
1.7488
1.6906
27
1.9736
1.8842
1.8361
1.7851
1.7306
1.6717
28
1.9586
1.8687
1.8203
1.7689
1.7138
1.6541
29
1.9446
1.8543
1.8055
1.7537
1.6981
1.6376
30
1.9317
1.8409
1.7918
1.7396
1.6835
1.6223
40
1.8389
1.7444
1.6928
1.6373
1.5766
1.5089
60
1.748
1.6491
1.5943
1.5343
1.4673
1.3893
120
1.6587
1.5543
1.4952
1.429
1.3519
1.2539
inf
1.5705
1.4591
1.394
1.318
1.2214
1
Statistical Tables
F-table at 0.1 level of significance
df in numerator
df in
denominator
1
2
3
4
5
6
1
39.86346
49.5
53.59324
55.83296
57.24008
58.20442
2
8.52632
9
9.16179
9.24342
9.29263
9.32553
3
5.53832
5.46238
5.39077
5.34264
5.30916
5.28473
4
4.54477
4.32456
4.19086
4.10725
4.05058
4.00975
5
4.06042
3.77972
3.61948
3.5202
3.45298
3.40451
6
3.77595
3.4633
3.28876
3.18076
3.10751
3.05455
7
3.58943
3.25744
3.07407
2.96053
2.88334
2.82739
8
3.45792
3.11312
2.9238
2.80643
2.72645
2.66833
9
3.3603
3.00645
2.81286
2.69268
2.61061
2.55086
10
3.28502
2.92447
2.72767
2.60534
2.52164
2.46058
11
3.2252
2.85951
2.66023
2.53619
2.45118
2.38907
12
3.17655
2.8068
2.60552
2.4801
2.39402
2.33102
13
3.13621
2.76317
2.56027
2.43371
2.34672
2.28298
14
3.10221
2.72647
2.52222
2.39469
2.30694
2.24256
15
3.07319
2.69517
2.48979
2.36143
2.27302
2.20808
16
3.04811
2.66817
2.46181
2.33274
2.24376
2.17833
17
3.02623
2.64464
2.43743
2.30775
2.21825
2.15239
18
3.00698
2.62395
2.41601
2.28577
2.19583
2.12958
19
2.9899
2.60561
2.39702
2.2663
2.17596
2.10936
20
2.97465
2.58925
2.38009
2.24893
2.15823
2.09132
21
2.96096
2.57457
2.36489
2.23334
2.14231
2.07512
22
2.94858
2.56131
2.35117
2.21927
2.12794
2.0605
23
2.93736
2.54929
2.33873
2.20651
2.11491
2.04723
24
2.92712
2.53833
2.32739
2.19488
2.10303
2.03513
25
2.91774
2.52831
2.31702
2.18424
2.09216
2.02406
26
2.90913
2.5191
2.30749
2.17447
2.08218
2.01389
27
2.90119
2.51061
2.29871
2.16546
2.07298
2.00452
28
2.89385
2.50276
2.2906
2.15714
2.06447
1.99585
29
2.88703
2.49548
2.28307
2.14941
2.05658
1.98781
30
2.88069
2.48872
2.27607
2.14223
2.04925
1.98033
40
2.83535
2.44037
2.22609
2.09095
1.99682
1.92688
60
2.79107
2.39325
2.17741
2.04099
1.94571
1.87472
120
2.74781
2.34734
2.12999
1.9923
1.89587
1.82381
inf
2.70554
2.30259
2.0838
1.94486
1.84727
1.77411
97
Statistical Tables
df in numerator
df in
denominator
7
8
9
10
12
15
1
58.90595
59.43898
59.85759
60.19498
60.70521
61.22034
2
9.34908
9.36677
9.38054
9.39157
9.40813
9.42471
3
5.26619
5.25167
5.24
5.23041
5.21562
5.20031
4
3.97897
3.95494
3.93567
3.91988
3.89553
3.87036
5
3.3679
3.33928
3.31628
3.2974
3.26824
3.23801
6
3.01446
2.98304
2.95774
2.93693
2.90472
2.87122
7
2.78493
2.75158
2.72468
2.70251
2.66811
2.63223
8
2.62413
2.58935
2.56124
2.53804
2.50196
2.46422
9
2.50531
2.46941
2.44034
2.41632
2.37888
2.33962
10
2.41397
2.37715
2.34731
2.3226
2.28405
2.24351
11
2.34157
2.304
2.2735
2.24823
2.20873
2.16709
12
2.28278
2.24457
2.21352
2.18776
2.14744
2.10485
13
2.2341
2.19535
2.16382
2.13763
2.09659
2.05316
14
2.19313
2.1539
2.12195
2.0954
2.05371
2.00953
15
2.15818
2.11853
2.08621
2.05932
2.01707
1.97222
16
2.128
2.08798
2.05533
2.02815
1.98539
1.93992
17
2.10169
2.06134
2.02839
2.00094
1.95772
1.91169
18
2.07854
2.03789
2.00467
1.97698
1.93334
1.88681
19
2.05802
2.0171
1.98364
1.95573
1.9117
1.86471
20
2.0397
1.99853
1.96485
1.93674
1.89236
1.84494
21
2.02325
1.98186
1.94797
1.91967
1.87497
1.82715
22
2.0084
1.9668
1.93273
1.90425
1.85925
1.81106
23
1.99492
1.95312
1.91888
1.89025
1.84497
1.79643
24
1.98263
1.94066
1.90625
1.87748
1.83194
1.78308
25
1.97138
1.92925
1.89469
1.86578
1.82
1.77083
26
1.96104
1.91876
1.88407
1.85503
1.80902
1.75957
27
1.95151
1.90909
1.87427
1.84511
1.79889
1.74917
28
1.9427
1.90014
1.8652
1.83593
1.78951
1.73954
29
1.93452
1.89184
1.85679
1.82741
1.78081
1.7306
30
1.92692
1.88412
1.84896
1.81949
1.7727
1.72227
40
1.87252
1.82886
1.7929
1.76269
1.71456
1.66241
60
1.81939
1.77483
1.73802
1.70701
1.65743
1.60337
120
1.76748
1.72196
1.68425
1.65238
1.6012
1.545
inf
1.71672
1.6702
1.63152
1.59872
1.54578
1.48714
98
Statistical Tables
df in numerator
df in
denominator
20
30
40
60
120
INF
1
61.74029
62.26497
62.52905
62.79428
63.06064
63.32812
2
9.44131
9.45793
9.46624
9.47456
9.48289
9.49122
3
5.18448
5.16811
5.15972
5.15119
5.14251
5.1337
4
3.84434
3.81742
3.80361
3.78957
3.77527
3.76073
5
3.20665
3.17408
3.15732
3.14023
3.12279
3.105
6
2.83634
2.79996
2.78117
2.76195
2.74229
2.72216
7
2.59473
2.55546
2.5351
2.51422
2.49279
2.47079
8
2.42464
2.38302
2.36136
2.3391
2.31618
2.29257
9
2.29832
2.25472
2.23196
2.20849
2.18427
2.15923
10
2.20074
2.15543
2.13169
2.10716
2.08176
2.05542
11
2.12305
2.07621
2.05161
2.02612
1.99965
1.97211
12
2.05968
2.01149
1.9861
1.95973
1.93228
1.90361
13
2.00698
1.95757
1.93147
1.90429
1.87591
1.8462
14
1.96245
1.91193
1.88516
1.85723
1.828
1.79728
15
1.92431
1.87277
1.84539
1.81676
1.78672
1.75505
16
1.89127
1.83879
1.81084
1.78156
1.75075
1.71817
17
1.86236
1.80901
1.78053
1.75063
1.71909
1.68564
18
1.83685
1.78269
1.75371
1.72322
1.69099
1.65671
19
1.81416
1.75924
1.72979
1.69876
1.66587
1.63077
20
1.79384
1.73822
1.70833
1.67678
1.64326
1.60738
21
1.77555
1.71927
1.68896
1.65691
1.62278
1.58615
22
1.75899
1.70208
1.67138
1.63885
1.60415
1.56678
23
1.74392
1.68643
1.65535
1.62237
1.58711
1.54903
24
1.73015
1.6721
1.64067
1.60726
1.57146
1.5327
25
1.71752
1.65895
1.62718
1.59335
1.55703
1.5176
26
1.70589
1.64682
1.61472
1.5805
1.54368
1.5036
27
1.69514
1.6356
1.6032
1.56859
1.53129
1.49057
28
1.68519
1.62519
1.5925
1.55753
1.51976
1.47841
29
1.67593
1.61551
1.58253
1.54721
1.50899
1.46704
30
1.66731
1.60648
1.57323
1.53757
1.49891
1.45636
40
1.60515
1.54108
1.50562
1.46716
1.42476
1.37691
60
1.54349
1.47554
1.43734
1.3952
1.34757
1.29146
120
1.48207
1.40938
1.3676
1.32034
1.26457
1.19256
inf
1.4206
1.34187
1.29513
1.23995
1.1686
1
99
Statistical Tables
χ2 distribution table
degrees probability
of freedom
0.995
0.99
0.975
0.95
0.9
0.75
0.5
1
0.00004
0.00016
0.00098
0.00393
0.01579
0.10153
0.45494
2
0.01003
0.0201
0.05064
0.10259
0.21072
0.57536
1.38629
3
0.07172
0.11483
0.2158
0.35185
0.58437
1.21253
2.36597
4
0.20699
0.29711
0.48442
0.71072
1.06362
1.92256
3.35669
5
0.41174
0.5543
0.83121
1.14548
1.61031
2.6746
4.35146
6
0.67573
0.87209
1.23734
1.63538
2.20413
3.4546
5.34812
7
0.98926
1.23904
1.68987
2.16735
2.83311
4.25485
6.34581
8
1.34441
1.6465
2.17973
2.73264
3.48954
5.07064
7.34412
9
1.73493
2.0879
2.70039
3.32511
4.16816
5.89883
8.34283
10
2.15586
2.55821
3.24697
3.9403
4.86518
6.7372
9.34182
11
2.60322
3.05348
3.81575
4.57481
5.57778
7.58414
10.341
12
3.07382
3.57057
4.40379
5.22603
6.3038
8.43842
11.34032
13
3.56503
4.10692
5.00875
5.89186
7.0415
9.29907
12.33976
14
4.07467
4.66043
5.62873
6.57063
7.78953
10.16531
13.33927
15
4.60092
5.22935
6.26214
7.26094
8.54676
11.03654
14.33886
16
5.14221
5.81221
6.90766
7.96165
9.31224
11.91222
15.3385
17
5.69722
6.40776
7.56419
8.67176
10.08519
12.79193
16.33818
18
6.2648
7.01491
8.23075
9.39046
10.86494
13.67529
17.3379
19
6.84397
7.63273
8.90652
10.11701
11.65091
14.562
18.33765
20
7.43384
8.2604
9.59078
10.85081
12.44261
15.45177
19.33743
21
8.03365
8.8972
10.2829
11.59131
13.2396
16.34438
20.33723
22
8.64272
9.54249
10.98232
12.33801
14.04149
17.23962
21.33704
23
9.26042
10.19572
11.68855
13.09051
14.84796
18.1373
22.33688
24
9.88623
10.85636
12.40115
13.84843
15.65868
19.03725
23.33673
25
10.51965
11.52398
13.11972
14.61141
16.47341
19.93934
24.33659
26
11.16024
12.19815
13.8439
15.37916
17.29188
20.84343
25.33646
27
11.80759
12.8785
14.57338
16.1514
18.1139
21.7494
26.33634
28
12.46134
13.56471
15.30786
16.92788
18.93924
22.65716
27.33623
29
13.12115
14.25645
16.04707
17.70837
19.76774
23.56659
28.33613
30
13.78672
14.95346
16.79077
18.49266
20.59923
24.47761
29.33603
100
Statistical Tables
degrees probability
of freedom
0.25
0.1
0.05
0.025
0.01
0.005
1
1.3233
2.70554
3.84146
5.02389
6.6349
7.87944
2
2.77259
4.60517
5.99146
7.37776
9.21034
10.59663
3
4.10834
6.25139
7.81473
9.3484
11.34487
12.83816
4
5.38527
7.77944
9.48773
11.14329
13.2767
14.86026
5
6.62568
9.23636
11.0705
12.8325
15.08627
16.7496
6
7.8408
10.64464
12.59159
14.44938
16.81189
18.54758
7
9.03715
12.01704
14.06714
16.01276
18.47531
20.27774
8
10.21885
13.36157
15.50731
17.53455
20.09024
21.95495
9
11.38875
14.68366
16.91898
19.02277
21.66599
23.58935
10
12.54886
15.98718
18.30704
20.48318
23.20925
25.18818
11
13.70069
17.27501
19.67514
21.92005
24.72497
26.75685
12
14.8454
18.54935
21.02607
23.33666
26.21697
28.29952
13
15.98391
19.81193
22.36203
24.7356
27.68825
29.81947
14
17.11693
21.06414
23.68479
26.11895
29.14124
31.31935
15
18.24509
22.30713
24.99579
27.48839
30.57791
32.80132
16
19.36886
23.54183
26.29623
28.84535
31.99993
34.26719
17
20.48868
24.76904
27.58711
30.19101
33.40866
35.71847
18
21.60489
25.98942
28.8693
31.52638
34.80531
37.15645
19
22.71781
27.20357
30.14353
32.85233
36.19087
38.58226
20
23.82769
28.41198
31.41043
34.16961
37.56623
39.99685
21
24.93478
29.61509
32.67057
35.47888
38.93217
41.40106
22
26.03927
30.81328
33.92444
36.78071
40.28936
42.79565
23
27.14134
32.0069
35.17246
38.07563
41.6384
44.18128
24
28.24115
33.19624
36.41503
39.36408
42.97982
45.55851
25
29.33885
34.38159
37.65248
40.64647
44.3141
46.92789
26
30.43457
35.56317
38.88514
41.92317
45.64168
48.28988
27
31.52841
36.74122
40.11327
43.19451
46.96294
49.64492
28
32.62049
37.91592
41.33714
44.46079
48.27824
50.99338
29
33.71091
39.08747
42.55697
45.72229
49.58788
52.33562
30
34.79974
40.25602
43.77297
46.97924
50.89218
53.67196
101
Bibliography
1)
Barro, R.J., D.B. Gordon (1983), A Positive Theory of Monetary Policy in
a Natural Rate Model, “The Journal of Political Economy,” Vol. 91, No. 4,
pp. 589–610, accessed via jstor.org, date of publication: 8.1983, date of
accession: 3.2010, http://www.jstor.org/pss/1831069.
2) Caporale, G.M., L.A. Gil-Alana (2008), Modeling the U.S., U.K. and
Japanese unemployment rates: Fractional integration and structural
breaks, “Computational Statistics & Data Analysis,” Vol. 52, No.
11, pp. 4998–5013, date of publication: 7.2008, date of accession:
4.2010,
http://www.sciencedirect.com/science/article/B6V8V-4SC78KW1/1/4db0cb865a44de068cb172a6c0f8ece3.
3) Chiang, A.C. (1984), Fundamental Methods of Mathematical Economics,
McGraw-Hill, 1984.
4) Dunaev, B.B. (2005), Measuring Unemployment and Inflation as Wages
Functions, “Cybernetics and System Analysis,” Vol. 41, No. 3, pp. 403–
414, date of publication: 5.2005, date of accession: 4.2010, http://www.
springerlink.com/content/f075283t34082664/.
5) Greene, W.H. (2003), Econometric Analysis, Prentice Hall, 2003.
6) Gujarati, D.N. (2006), Essentials of Econometrics, McGraw-Hill/Irwin, New
York 2006.
7) Hanke, J.E., D.W. Wichern (2005), Business Forecasting, Pearson Education,
2005.
8) Intriligator, M.D. (1978), Econometric Models, Techniques & Applications,
Prentice-Hall, 1978.
9) Montgomery, A.L., V. Zarnowitz, S. Tsay, Ruey, G.C. Tiao (1998), Forecasting the
U.S. Unemployment Rate, “Journal of the American Statistical Association,”
Vol. 93, No. 442, pp. 478–493, accessed via jstor.org, date of publication:
6.1998, date of accession: 3.2010, http://www/jstor.ord/pss/2670094.
10) Pindyck, R.S., D.L. Rubinfeld (1998), Econometric Models and Econometric
Forecasts, Irwin/McGraw-Hill International Editions, Singapore 1998.
11) Proietti, T. (2003), Forecasting the U.S. unemployment rate, “Computational
Statistics & Data Analysis,” Vol. 42, No. 3, pp. 451–476, date of publication:
103
Bibliography
12)
13)
14)
15)
16)
17)
18)
104
3.2003, date of accession: 3.2010, http://portal.acm.org/citation.
cmf?id=770742.
Rothman, Ph. (1998), Forecasting Asymmetric Unemployment Rates, MIT
Press, “The Review of Economics and Statistics,” Vol. 80, Issue: 1, pp. 164–
168, date of publication: 2.1998, date of accession: 3.2010, http://www.
mitpressjournals.org/doi/abs/10.1162/003465398557276.
Salop, S. (1979), A Model of the Natural Rate of Unemployment, “The
American Economic Review,” Vol. 69, No. 1, pp. 117–125, accessed via jstor.
org, date of publication: 3.1979, date of accession: 4.2010, http://www/
jstor.ord/pss/1802502.
Shimer, R. (1998), Why Is the U.S. Unemployment Rate so Much Lower?,
“NBER Macroeconomics Annual,” Vol. 13, pp. 11–61, accessed via jstor.org,
date of publication: 1998, date of accession: 4.2010, http://www.jstor.org/
pss/4623732.
Stock, J.H., M.W. Watson (2008), Introduction to Econometrics, Pearson
Education, 2008.
Studenmund, A.H. (2006), Using Econometrics. A practical guide, Pearson
Education, 2006.
Theil, H. (1971), Principles of Econometrics, A Wiley/Hamilton Publication,
1971.
Wooldridge, J.M. (2010), Econometric Analysis of Cross Section and Panel
Data, Massachusetts Institute of Technology, MIT Press, Cambridge 2010.
List of Figures
List of Figures
Equation 1. Basic structural equation, i.e., the skeleton . . . . . . . . . . . . . . . .10
Equation 2. Simple, linear form structural equation for working with
a cross-section data set, with i representing a specific observation . . . . .15
Equation 3. Simple, semi-log form structural equation for working
with a cross-section data set, with i representing a specific
observation – log-linear form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Equation 4. Simple, semi-log form structural equation for working
with a cross-section data set, with i representing a specific
observation – linear-log form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Equation 5. Simple, full-log form structural equation for working with
a cross-section data set, with i representing a specific observation . . . . .16
Equation 6. Simple, linear form structural equation for working with
a time-series data set, with t representing a specific year . . . . . . . . . . . .16
Equation 7. Simple, linear form structural equation for working
with a panel data set, with i representing cross-section elements,
i.e., host countries, and t representing time-series elements, i.e.,
a specific year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Equation 8. Dummy variable creation: Sale price example, original
equation (no dummy variable) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
Equation 9. Dummy variable creation: Sale price example, original
equation (with a dummy variable) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Equation 10. Dummy variable creation: Sale price example, original
equation (with two dummy variables) . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Equation 11. Simple averaging method . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
Equation 12. Model estimation with forward stepwise method
example – initial structural, restricted equation . . . . . . . . . . . . . . . . . . . .34
Equation 13. Model estimation with forward stepwise method
example – initial structural, restricted model . . . . . . . . . . . . . . . . . . . . . .35
105
List of Figures
Equation 14. Model estimation with forward stepwise method
example – auxiliary structural equation . . . . . . . . . . . . . . . . . . . . . . . . . .35
Equation 15. Model estimation with forward stepwise method
example – auxiliary structural model . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
Equation 16. Lagrange Multiplier formula . . . . . . . . . . . . . . . . . . . . . . . . . .36
Equation 17. Model estimation with forward stepwise method
example – initial structural, unrestricted model . . . . . . . . . . . . . . . . . . . .37
Equation 18. Structural equation with a AR(p) term . . . . . . . . . . . . . . . . . . .42
Equation 19. Structural equation with AR(p) terms 1 through 3 . . . . . . . . . .42
Equation 20. Structural equation with lagged dependent variable as
an additional explanatory variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
Equation 21. Adjustment of the nth coefficient with r lagged
dependent variables used as independent factors . . . . . . . . . . . . . . . . . .43
Equation 22. Adjustment of the 1st coefficient with one lagged
dependent variables used as independent factors . . . . . . . . . . . . . . . . . .43
Equation 23. Linear structural equation of the model used in Model’s
Results Interpretation chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47
Equation 24. Estimated version of the linear structural equation of
the model used in Model’s Results Interpretation chapter . . . . . . . . . . . .48
Equation 25. Example of the t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
Equation 26. Joint significance test – structural model, restricted. . . . . . . . .51
Equation 27. Joint significance test – structural model, unrestricted. . . . . . .52
Equation 28. F-test formula with Error Sum Squares. . . . . . . . . . . . . . . . . . .52
Equation 29. R2 of the unrestricted model as a function of its Error
Sum of Squares and Total Sum of Squares . . . . . . . . . . . . . . . . . . . . . . . .52
Equation 30. R2 of the unrestricted model as a function of its Error
Sum of Squares and Total Sum of Squares . . . . . . . . . . . . . . . . . . . . . . . .53
Equation 31. F-test formula with R-squared . . . . . . . . . . . . . . . . . . . . . . . . .53
Equation 32. R-squared formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54
Equation 33. Adjusted R-squared formula. . . . . . . . . . . . . . . . . . . . . . . . . . .54
Equation 34. Total Sum or Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
Equation 35. Error (Residual) sum of squares . . . . . . . . . . . . . . . . . . . . . . . .55
Equation 36. Total sum of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
Equation 37. Regression sum of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . .55
106
List of Figures
Equation 38. Structural equation for U.S. imports
as the dependent variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Equation 39. Restricted structural equation . . . . . . . . . . . . . . . . . . . . . . . . .73
Equation 40. Restricted structural model . . . . . . . . . . . . . . . . . . . . . . . . . . .73
Equation 41. Structural auxiliary equation . . . . . . . . . . . . . . . . . . . . . . . . . .75
Equation 42. Unrestricted structural model. . . . . . . . . . . . . . . . . . . . . . . . . .76
Figure 1. Division of the original data set into Estimation Period, Ex
post and Ex ante sections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
Graph 1. U.S. gross domestic product (left-hand axis in billion, USD) . . . . . .20
Graph 2. U.S. gross domestic product (left-hand axis in billion, USD)
with a linear trendline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Graph 3. A graphical representation of U.S. GDP after it has been
transformed into a stationary variable via first-order differencing; D(GDP)23
Graph 4. Graph of residuals of a model with U.S. imports (IM) as the
dependent variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Graph 5. Graph of actual values (Actual), the fitted model (Fitted);
both on left-hand side axis, and resulting residuals (Residuals);
right-hand axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56
Graph 6. A plot of the original U.S. imports data (IM) versus the
forecast (IMF) and the upper (UP) and lower (DOWN) limits . . . . . . . . . .61
Graph 7. Actual, fitted data and residuals of the final model . . . . . . . . . . . .80
Graph 8. Ex post forecast of the final model . . . . . . . . . . . . . . . . . . . . . . . . .81
Table 1. An example of panel data with averages per firm and per
year listed in the last row and the last column respectively . . . . . . . . . . .14
Table 2. Variables Info Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Table 3. An example of a correlogram of data with a unit root present . . . .22
Table 4. Output of the Augmented Dickey-Fuller test. . . . . . . . . . . . . . . . . . .22
Table 5. A correlogram of U.S. GDP after it has been transformed into
a stationary variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Table 6. the Augmented Dickey-Fuller test output testing the 1st
difference of U.S. GDP for stationarity (only relevant information included)23
107
List of Figures
Table 7. A correlation matrix for the number of U.S. FDI firms and the
GDP in two regions in Poland. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
Table 8. Descriptive statistics of U.S. imports, U.S. exports and
a dummy variable for recession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
Table 9. A summary of information for the U.S. GDP variable . . . . . . . . . . . .27
Table 10. Dummy variable creation: European Union
membership example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27
Table 11. Dummy variable creation: Sale price example, original data set . . .28
Table 12. Dummy variable creation: Sale price example,
transformed data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
Table 13. Dummy variable creation: Sale price example, transformed,
version 2, data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Table 14. Supplementing the missing data example, original data set . . . . .31
Table 15. A section of the Chi-square table with error levels in the first
row and degrees of freedom in the first column . . . . . . . . . . . . . . . . . . .36
Table 16. An example of a correlogram output for the U.S.
imports model.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
Table 17. An example of the Breusch-Godfrey Serial Correlation LM
test output for the U.S. imports model . . . . . . . . . . . . . . . . . . . . . . . . . .42
Table 18. An example of a heteroscedasticity LM White test for the
U.S. Imports model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
Table 19. Coefficient estimation output from software after
estimating the U.S. imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
Table 20. Summary of the coefficient testing procedure for one-tail tests. . .50
Table 21. Summary of the coefficient testing procedure for two-tail tests . .51
Table 22. Model’s statistics output from the software after estimating
the U.S. imports by regressing them on the constant term (C),
disposable income (YD), U.S. population (POP), wealth (W), U.S.
GDP and U.S. exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
Table 23. Correlogram of U.S. imports variable after it has been
differentiated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Table 24. ARIMA (1, 1, 2) model output for U.S. imports variable.
Note that the independent variable is not IM but d(IM) – a 1st-level
difference of the original independent variable . . . . . . . . . . . . . . . . . . . .60
Table 25. Potential independent variables . . . . . . . . . . . . . . . . . . . . . . . . . . .68
Table 26. Hypothesis statements for all independent variables . . . . . . . . . . .70
108
List of Figures
Table 27. Correlation Matrix for all variables . . . . . . . . . . . . . . . . . . . . . . . . .71
Table 28. Hypothesis statements for the Unit Route tests . . . . . . . . . . . . . . .72
Table 29. Results of the Augmented Dickey-Fuller test for the presence
of a Unit Route. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72
Table 30. Values of the restricted model’s parameters. . . . . . . . . . . . . . . . . .74
Table 31. Values of the restricted model’s statistics. . . . . . . . . . . . . . . . . . . .74
Table 32. Results of the Breusch-Godfrey Serial Correlation Lagrange
Multiplier test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
Table 33. Results of the auxiliary regression (1) . . . . . . . . . . . . . . . . . . . . . . .75
Table 34. Stationarity test for CHG.INV and RECES variables . . . . . . . . . . . . .76
Table 35. Values of the unrestricted model’s parameters . . . . . . . . . . . . . . .77
Table 36. Values of the unrestricted model’s statistics. . . . . . . . . . . . . . . . . .77
Table 37. Breusch-Godfrey Serial Correlation Lagrange Multiplier test
for the final model (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Table 38. Breusch-Godfrey Serial Correlation Lagrange Multiplier test
for the final model (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Table 39. Breusch-Godfrey Serial Correlation Lagrange Multiplier test
for the final model (3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Table 40. Values of the corrected unrestricted model’s parameters. . . . . . . .78
Table 41. Values of the corrected unrestricted model’s statistics . . . . . . . . . .79
Table 42. White heteroscadesticity test for the final model . . . . . . . . . . . . . .79
Table 43. Descriptive Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82
109
Notes
Notes
112