Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Choice modelling wikipedia , lookup
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Least squares wikipedia , lookup
Linear regression wikipedia , lookup
PPA 207: Quantitative Methods Meeting 5, Spring 2004 1. Homework Studenmund, Chapter 4, Number 7 a. The regression coefficient on C, or 0.002, indicates that a one-percent increase in labor force that is non-white results in a 0.002 increase in the percent of labor force participation among males age 25 to 54. The regression coefficient on D, or -0.80, indicates that the percent labor force participation in the South among males age 25 to 54 is 0.80 percent less than in all other areas of country. b. Perfect colinearity between two independent variables implies that they really are the same variable, or that one is a multiple of other, or that a constant has been added to one of the variables. It does not appear that any of the independent variables satisfy this condition. c. A biased regression estimator is one whose distribution of possible values is not centered on the true regression estimator. If you run the same regression using two different samples of data, different regression coefficients can result for the same independent variable and not mean that the regression estimator is biased. This is because these regression coefficients are drawn from a distribution of possible values that is only centered on the true value. But if you wish to compare coefficients on D calculated for different decades you need to be careful in your interpretation. One represents an unbiased estimate of the effect of South on labor force participation in the 1960's and one in other decade. d. This is a confusing statement that could be answered in at least two ways. The first is to recognize that 94.2 is the constant term in the regression and has nothing to do with average participation rate. As stated in the chapter, its value changes to satisfy requirement that mean of residuals equal zero. Therefore I disagree with statement. The second is that 94.2 represents the average predicted value for L from the regression and if it really is way above the actual average value for L, then some of the conditions necessary for the regression coefficients to be BLUE may have been violated (omitted independent variables, residuals that are correlated with a variable, etc.) Therefore, I agree with the statement. Studenmund, Chapter 4, Number 9 a. Assumption III (all explanatory variables are uncorrelated with the error term). The reason for this is that P (price) cannot be considered a purely independent or exogenous variable. P, along a demand curve, does not change independently and changes when a factor that shifts the supply curve changes. The best way to model this is to also specify a supply-based relationship between Y and P and use appropriate simultaneous equation methods we will discuss later. 1 You could assume that P is set exogenously (as in a regulator dictating what price the good is). b. When an ad is placed in school newspaper, yogurt sales rise by 134.3 during time add in place. c. Ct represents a dummy variable for non-summertime. Sales rise by 152.1 during the summer months. d. There are taste variables in high temperature, ad placed, and non-summer months. There is no income variable. If store stays in same neighborhood over period examined, and no great movement of households in and out of neighborhood, income differences could be proxied by unemployment rate in city. You could also try to account for the impact of the price of substitutes by accounting for the average price of yogurts within a few miles of stores, or the number of stores selling yogurts. Pollock, Chapter 4, Numbers 1 and 2 Covered in class 2. Studenmund, Chapter 5, Basic Statistics and Hypothesis Testing Test theories with data from real world; is result from data likely due to chance? Does a policy change exert an independent impact? Regression coefficient derived is only one from a distribution of estimates Statistical inference: draw conclusions about entire population from a sample drawn from population Null and alternative hypotheses Null: range of values of regression coefficient if theory is not correct Hypothesis that researcher does not believe Alternative: range of values of regression coefficient if theory correct Done so can control the probability of rejecting null hypothesis when actually true One-sided test H0: β ≥ 0 HA: β< 0 Or Two-sided test (most of what we will do) H0: β = 0 HA: β ≠ 0 Look at Figure 5.1 to understand meaning Type I and type II errors Type I: Reject a true null hypothesis 2 Type II: Do not reject a false null hypothesis Different than book’s example (wrong) H0: Defendant is guilty HA: Defendant is innocent Type I error: let a guilty defendant free Type II error: sending an innocent defendant to jail We prefer Type I errors to Type II in our criminal justice system But tradeoff In most statistical applications prefer to minimize chance of Type I error See Figure 5.3 and 5.4 Decision rule and use of a critical value Keep tails of distribution (rejection region) small t-test (Equation 5.2 for midterm exam) tk = [ (βhat k – βH0) / SE(βhatk) ] Critical t value (tc) Selected from Table B-1 One or two-sided test? Degree of freedom: N – K – 1 Level of significance (probability of Type I error) 5 or 10% are usually used Reject Ho if I tk I > tc Level of confidence (1 – level of significance) 95 or 90% are usually used Confidence interval (Equation 5.5 for midterm exam) βhat + tc(SE(βhat)) Very valuable for policy recommendations Habit of reporting instead of single coefficient value Work through example (Section 5.3.1) of one-sided tests Work through example (Section 5.3.2) of two-sided tests Simple correlation coefficient (Equation 5.8 for midterm exam) r12 = [ { ∑ ( (X1 – meanX1)(X2 – meanX2) ) } / { sq root (∑ (X1 – meanX1)2 ∑ (X2 – meanX2)2 } ] Measure of collinearity Varies from -1 to 1 Do not worry about calculating the corresponding t statistic Things the t-test does not test Theoretical validity 3 A statistically significant result is not a theoretically correct one Only appropriate in the context of a reasonable theory Can we reject the null hypothesis Importance This is measured by the magnitude of the coefficient Later on we will calculate elasticities For samples drawn from the entire population Rarely need to worry about this in policy analysis F-Test of overall significance (Equation 5.14 for midterm exam) F = [ { (∑ (Y1 – meanY)2) / K } / { (∑ ei2) } / (n – K – 1 ) } ] H0: βN = βP = βI = 0 HA: H0 not true 4. Play around with Wassmer’s Sprawl Data Read UA 2000 All Data Formatted into Excel spreadsheet Go to 2000 UA and save land area (B) and urban fringe land area (D) Go to 2000 UA Continued and save workers 16 years plus (AK) and 90+ minutes of commute time (AX) Merge into one spreadsheet and rename Read into SPSS Transform variables Run regression 5. Homework Due the Start of Meeting Six (1) Read all of the material under meeting six in the syllabus; come prepared to discuss. (2) A typed and well developed question from reading assignment for week five. (3) Answer questions 13 and 15 in Studenmund, Chapter 5, typed on separate pages of paper. (4) A heads up that next week’s homework will be to collect data for your paper’s dependent variable and turn in an SPSS printout of descriptive statistics on it. I would suggest getting a head start on it this week. 4