Download PPA 207: Quantitative Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
PPA 207: Quantitative Methods
Meeting 5, Spring 2004
1. Homework
Studenmund, Chapter 4, Number 7
a. The regression coefficient on C, or 0.002, indicates that a one-percent
increase in labor force that is non-white results in a 0.002 increase in the percent
of labor force participation among males age 25 to 54. The regression coefficient
on D, or -0.80, indicates that the percent labor force participation in the South
among males age 25 to 54 is 0.80 percent less than in all other areas of country.
b. Perfect colinearity between two independent variables implies that they really
are the same variable, or that one is a multiple of other, or that a constant has
been added to one of the variables. It does not appear that any of the
independent variables satisfy this condition.
c. A biased regression estimator is one whose distribution of possible values is
not centered on the true regression estimator. If you run the same regression
using two different samples of data, different regression coefficients can result for
the same independent variable and not mean that the regression estimator is
biased. This is because these regression coefficients are drawn from a
distribution of possible values that is only centered on the true value. But if you
wish to compare coefficients on D calculated for different decades you need to
be careful in your interpretation. One represents an unbiased estimate of the
effect of South on labor force participation in the 1960's and one in other decade.
d. This is a confusing statement that could be answered in at least two ways. The
first is to recognize that 94.2 is the constant term in the regression and has
nothing to do with average participation rate. As stated in the chapter, its value
changes to satisfy requirement that mean of residuals equal zero. Therefore I
disagree with statement. The second is that 94.2 represents the average
predicted value for L from the regression and if it really is way above the actual
average value for L, then some of the conditions necessary for the regression
coefficients to be BLUE may have been violated (omitted independent variables,
residuals that are correlated with a variable, etc.) Therefore, I agree with the
statement.
Studenmund, Chapter 4, Number 9
a. Assumption III (all explanatory variables are uncorrelated with the error term).
The reason for this is that P (price) cannot be considered a purely independent or
exogenous variable. P, along a demand curve, does not change independently
and changes when a factor that shifts the supply curve changes. The best way
to model this is to also specify a supply-based relationship between Y and P and
use appropriate simultaneous equation methods we will discuss later.
1
You could assume that P is set exogenously (as in a regulator dictating what
price the good is).
b. When an ad is placed in school newspaper, yogurt sales rise by 134.3 during
time add in place.
c. Ct represents a dummy variable for non-summertime. Sales rise by 152.1
during the summer months.
d. There are taste variables in high temperature, ad placed, and non-summer
months. There is no income variable. If store stays in same neighborhood over
period examined, and no great movement of households in and out of
neighborhood, income differences could be proxied by unemployment rate in city.
You could also try to account for the impact of the price of substitutes by
accounting for the average price of yogurts within a few miles of stores, or the
number of stores selling yogurts.
Pollock, Chapter 4, Numbers 1 and 2
Covered in class
2. Studenmund, Chapter 5, Basic Statistics and Hypothesis Testing

Test theories with data from real world; is result from data likely due to
chance?
Does a policy change exert an independent impact?

Regression coefficient derived is only one from a distribution of estimates

Statistical inference: draw conclusions about entire population from a
sample drawn from population

Null and alternative hypotheses
Null: range of values of regression coefficient if theory is not correct
Hypothesis that researcher does not believe
Alternative: range of values of regression coefficient if theory correct
Done so can control the probability of rejecting null hypothesis when
actually true
One-sided test
H0: β ≥ 0
HA: β< 0
Or
Two-sided test (most of what we will do)
H0: β = 0
HA: β ≠ 0
Look at Figure 5.1 to understand meaning

Type I and type II errors
Type I: Reject a true null hypothesis
2
Type II: Do not reject a false null hypothesis
Different than book’s example (wrong)
H0: Defendant is guilty
HA: Defendant is innocent
Type I error: let a guilty defendant free
Type II error: sending an innocent defendant to jail
We prefer Type I errors to Type II in our criminal justice system
But tradeoff
In most statistical applications prefer to minimize chance of Type I error
See Figure 5.3 and 5.4
Decision rule and use of a critical value
Keep tails of distribution (rejection region) small

t-test (Equation 5.2 for midterm exam)
tk = [ (βhat k – βH0) / SE(βhatk) ]

Critical t value (tc)
Selected from Table B-1
One or two-sided test?
Degree of freedom: N – K – 1
Level of significance (probability of Type I error)
5 or 10% are usually used
Reject Ho if
I tk I > tc

Level of confidence (1 – level of significance)
95 or 90% are usually used

Confidence interval (Equation 5.5 for midterm exam)
βhat + tc(SE(βhat))
Very valuable for policy recommendations
Habit of reporting instead of single coefficient value

Work through example (Section 5.3.1) of one-sided tests

Work through example (Section 5.3.2) of two-sided tests

Simple correlation coefficient (Equation 5.8 for midterm exam)
r12 = [ { ∑ ( (X1 – meanX1)(X2 – meanX2) ) } /
{ sq root (∑ (X1 – meanX1)2 ∑ (X2 – meanX2)2 } ]
Measure of collinearity
Varies from -1 to 1
Do not worry about calculating the corresponding t statistic

Things the t-test does not test
Theoretical validity
3
A statistically significant result is not a theoretically correct one
Only appropriate in the context of a reasonable theory
Can we reject the null hypothesis
Importance
This is measured by the magnitude of the coefficient
Later on we will calculate elasticities
For samples drawn from the entire population
Rarely need to worry about this in policy analysis

F-Test of overall significance (Equation 5.14 for midterm exam)
F = [ { (∑ (Y1 – meanY)2) / K } / { (∑ ei2) } / (n – K – 1 ) } ]
H0: βN = βP = βI = 0
HA: H0 not true
4. Play around with Wassmer’s Sprawl Data

Read UA 2000 All Data Formatted into Excel spreadsheet
Go to 2000 UA and save land area (B) and urban fringe land area (D)
Go to 2000 UA Continued and save workers 16 years plus (AK) and 90+
minutes of commute time (AX)
Merge into one spreadsheet and rename

Read into SPSS
Transform variables
Run regression
5. Homework Due the Start of Meeting Six
(1) Read all of the material under meeting six in the syllabus; come prepared to
discuss.
(2) A typed and well developed question from reading assignment for week five.
(3) Answer questions 13 and 15 in Studenmund, Chapter 5, typed on separate
pages of paper.
(4) A heads up that next week’s homework will be to collect data for your paper’s
dependent variable and turn in an SPSS printout of descriptive statistics on it. I
would suggest getting a head start on it this week.
4