Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Parameter Estimation, Dummies, & Model Fit We know mechanically how to “run a regression”…but how are the parameters actually estimated? How can we handle “categorical” explanatory (independent) variables? What is a measure of “goodness of fit” of a statistical model to data? Example: Alien Species Exotic species cause economic and ecological damage Not all countries equally invaded Want to understand characteristics of country that make it more likely to be “invaded”. Understanding Invasive Species Steps to improving our understanding: 1. Generate a set of hypotheses (so they can be “accepted” or “rejected”) 2. Develop a statistical model. Interpret hypotheses in context of statistical model. 3. Collect data. Estimate parameters of model. 4. Test hypotheses. 2 Hypotheses (in words) • We’ll measure “invasiveness” as proportion of Alien/Native species (article by Dalmazzone). 1. Population density plays a role in a country’s invasiveness. 2. Island nations are more invaded than mainland nations. Population Density 2.0 A.N 1.5 1.0 0.5 0.0 0 200 400 600 Pop.dens 800 1000 1200 Island vs. Mainland 2.0 A.N 1.5 1.0 0.5 0.0 -0.1 0.1 0.3 0.5 Island 0.7 0.9 1.1 Variables Variables: Dependent: Proportion of number of alien species to native species in each country. Independent: • Island? • Population Density • GDP per capita • Agricultural activity Computer Minimizes Sei2 Remember, OLS finds coefficients that minimize sum squared residuals Graphical representation Why is this appropriate? Can show that this criterion leads to estimates that are most precise unbiased estimates. Dummy Variable Generally: Use a “Dummy Variable”. Value = 1 if country is Island, 0 otherwise. More generally, if n categories, use n-1 dummies. Male/Female; Pre-regulation/Post-regulation; etc.. E.g. if want to distinguish between 6 continents Problem: Lose “degrees of freedom”. A Simple Model A simple linear model looks like this: Ai 1 2 ISLi 3Pi 4GDPi 5 AGR i i Dummy changes intercept (explain). Interaction dummy variable? E.g. Invasions of island nations more strongly affected by agricultural activity. Translating our Hypotheses 2 Hypotheses Hypothesis 1: Population: Focus on 3 Hypothesis 2: Island: Focus on 2 “Hypothesis Testing”… forthcoming in course. Parameter Estimates: (Intercept) Island Pop.dens GDP Value -0.0184 0.0623 0.0010 0.0000 Std.Error t value Pr(>|t|) 0.0859 -0.2141 0.8326 0.0821 0.7588 0.4564 0.0002 6.2330 0.0000 0.0000 3.3249 0.0032 Agr -0.0014 0.0015 -0.9039 0.3763 “Goodness of Fit”: R2 “Coefficient of Determination” R2=Squared correlation between Y and OLS prediction of Y R2=% of total variation that is explained by regression, [0,1] OLS maximizes R2. Adding independent cannot R2 Adjusted R2 penalizes for # vars. Answers Island nations are more heavily invaded (.0623) Not significant (p=.46) Population density has impact on invasions (.001) Significant (p=.0000) R2=.80; about 80% of variation in dependent variable explained by model. Also, corr(A,Ahat)=.89