Download Computer Assignment 2 – Economic Growth, Correlation, and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Computer Assignment 2 – Economic Growth, Correlation, and Regression
Statistics 1040 – Dr. McGahagan
The dataset to be used for this assignment is based on a well-known article by N.Gregory Mankiw,
David Romer and David N. Weil, presented in their article “A Contribution to the Empirics of Economic
Growth”, in the Quarterly Journal of Economics, 107(2):407-37 (1992).
To download the files you will need, go to the course web page. On the home page for statistics,
click on “Data files for R”, just under “The R statistical language”. Then click on “Economic Growth data.”
On that page, right click on the links to the files “ecgrow.tab” and “Read.ecgrow.R” and download them to your
R directory (not to the default downloads directory; your R directory is by default your home directory).
Once in R, you can use the files simply by the command:
>>> use(“ecgrow”)
If this doesn't work, download the newer Stat1040.R file from the web by the command: >>> get1040()
and try again. If it still doesn't work, try the command: >>> source(“Read.ecgrow.R”)
You should notice some helpful output as you first load the file. You can re-run it at any time by using
the command: >>> help.ecgrow() Do this more than once: I will ask a question or 5 on the next exam about the
variables and their correlation (not the exact number, but the direction and the rough magnitude).
To get a sense of what the file looks like, use the command: >>> ecgrow[1:4, ]
which will get you the first 4 rows and all columns of the ecgrow data frame.
(Note that the names of the variables on the top row are not counted as among the data frame rows)
It will look like this:
isocode country grate invest edu pop gdp60 gdp85 oecd openness region
DZA Algeria 2.1 23.6 2.4 2.9 26 31 No 0
N.Africa
ARG Argentina 0.5 14.6 6.7 1.4 47 38 No 0 S.America
AUS Australia 1.5 26.9 10.2 1.5 78 80 Yes 1 E.Asia
AUT Austria 3.0 24.7 6.6 0.2 44 65 Yes 1 Europe
Must have # 1: Attach the output of the command >>> stats(your.region) to your paper.
For example, if you did have Algeria, use the command >>> stats(n.africa)
Must have # 2: Explain in a paragraph the relation between your country, its region, and the world
You should include a description of its growth rate, investment, education levels, GDP in 1960 and 1985,
and whether or not it pursued open trade policies, and how they compare to regional and world averages.
Make a special note if the country is in the top or bottom quartile of its region or of the world on each of these.
Comment briefly on any surprises the comparison seems to offer. For example, is the country growing rapidly
despite high investment, high educational levels, high initial GDP, or open trade policies?
If you run into what seem like puzzles, a brief visit to the relevant Wikipedia article may give you something to
add to your paper (not a lot is necessary – obviously civil wars or the discovery of oil reserves may help you
give a brief explanation)
Must have #3. A boxplot and/or density plot comparing the growth rate of the world and your region.
Examples: Boxplot(grate, s.america$grate) or Densityplot(grate, s.america$grate)
The dollar sign is a reference to a variable within that regional data frame.
In your text, identify any outliers for your region by using the command:
outliers(s.america$grate, s.america$country)
Must have # 4. A scatter plot of the relation between investment and growth rate for your region.
For example, if you have a Sub-Saharan country, you can use the command:
>>> with(s.africa, Plot(invest, grate))
You should also identify your country and any unusual countries with the command:
>>> with(s.africa, identify(invest, grate, country))
Include brief answers to the following:
Does the plot seem to show a strong or weak correlation? Is the correlation positive or negative?
Try to guess the correlation's value, then add the exact value: >>> with(s.africa, corr(invest, grate) to
your paper.
Must have # 5. Compute a regression line and add it to your plot.
Treat growth rates as the dependent variable, and the percent of GDP invested as the independent
variable. Estimate a linear regression model (in R it is called simply a linear model) by the following
command:
>>> model < - with(s.africa, lm(grate ~ invest)
Issue the command >>> model
and you will see the coefficients of your regression model. Include these in your paper, in the
form of a linear equation. The equation will be, for the sub-saharan Africa data, something like (numbers
changed a bit so those to whom a sub-saharan country was assigned will have to actually do this).
Report the equation using those coefficient as, for example grate = -0.5 + 0.2 * invest.
To add the line to the plot, you may use the command abline (the name is because the general form of a linear
equation is y = a + b*x):
>>> abline(-0.5, 0.2, col=”red”, lwd=4)
You should use this equation to predict the growth rate of your country using its investment rate:
for example, if your country had a 20 percent of GDP investment rate, your prediction would be:
estimated grate = -0.5 + 0.2 * 20 = -0.5 + 4 = 3.5
But we have the data on actual growth rates: if the actual growth rate for our country were 5 percent, our
estimate would have a prediction error or residual of actual grate – estimated grate = 5 – 3.5 = 1.5
Add the basic statistics on the residuals to your paper with the command >>> stats(model$residuals)
Check to see whether the residuals are normal – remember the QQnorm plot, or use the Density plot command.
You need not include the plots in your paper, but tell me in your writeup.
The above items are all that is expected in your paper. Please answer them carefully. The explanations
should be clearly expressed in grammatical English. Present them in a word-processed, (not handwritten) form,
with scatterplots attached.
Must have #7 is to identify the key variables in your dataset -- those which are more influential in economic
growth. Correlation and regression will be the key tools to do this.
You should compare these to the variables which are most influential in the rest of the world.
First, create a Rest of the World variable (ROW) by using the command:
>>> ROW (the last term should of course be replaced by your region: to find the term to substitute, use the
command >>> levels(ecgrow$region)
Next, look carefully at the correlations between growth rates and other variables:
>>> corr(grate, mena)
>>> corr(grate, ROW)
And if you have installed the “corrplot” package with the command >>> install.packages(“corrplot”)
>>> corr.plot(mena)
>>> corr.plot(ROW)
Describe how (for example; do this for all variables)
growth rates are related to education or GDP in 1960.
pay especial attention to cases in which the correlations have opposite signs or differ significantly (by more than
.20 is a rule of thumb for a likely significant difference of correlations).
Reasons for different correlations between data sets may include:
Political events: Mideast and unrest may dampen the relation between investment and the growth rate.
Economic events: growth driven by natural resources is vulnerable to drops in commodity prices:
Guyana invested a reasonable amount, but depended heavily on the demand for bauxite, and when bauxite prices
collapsed, so did Guyana's growth.
Lack of variation in the data: In S. America, openness does not have as much of an impact as in the rest
of the world, perhaps because only a few countries pursued open trade policies.
A too-small number of observations.
Look at selected regressions of growth rate on other variables which show a clear contrast.
For this assignment, focus on the strength of single regressions:
for example, >>> model
Questions to ask yourself:
What is the regression equation?
What is the standard error of the regression?
What is the predicted value of the growth rate if investment is at (say) the first quartile of world investment?
What does the R-squared tell you about how much the regression is explaining?
Does the confidence interval of any coefficient include zero? If so, what does this mean?
Get another variable into the regression. This is a matter of trial and error: try, for example, the commands:
>>> with(e.asia, regress(grate ~ invest + openness + gdp60))
>>> with(s.america, regress(grate ~ invest + pop + edu))
and see how they differ from the rest-of-the-world dataset.
Are either of these an improvement on the single variable regression? How much of an improvement and why
are they an improvement. In answering this:
check that coefficients are not zero (see the confidence interval)
check whether the R-squared has improved. Use the R-squared as a way to find the best regression.