Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UNC-Wilmington Dept. Economics and Finance ECN 525 Dr. Chris Dumas Model Building Example 1—Air Pollution & Lung Cancer Suppose that public health officials suspect a relationship between the air pollution index (X) in various cities and the number of new lung cancer cases per year per 100,000 people (Y). Suppose that you have available the following (hypothetical) data on X and Y in various cities. Public health officials say that more air pollution leads to more lung cancer, whereas industry representatives claim that lung cancer is caused by other factors (smoking, etc.). Air Pollution Index X 100 134 178 205 240 160 134 122 210 136 227 130 162 170 187 156 176 144 127 139 152 166 188 New Lung Cancer Cases per Year (per 100,000 people) Y 304 323 365 341 400 336 301 327 343 292 364 292 335 302 343 317 355 357 318 298 329 334 321 1) Based on the available data given in the table above, try to find a pattern in the data, and form a (very) simple theory about the relationship between the air pollution index and new lung cancer cases. State your theory in a sentence. Remember that a theory is a pattern in data, so, your statement of a theory should be a description of a possible pattern in the data. When describing the pattern, be sure to specify the particular variables involved in the pattern and how they relate together. 2) <this question deleted> 1 3) Use Excel to construct a graph of the air pollution index vs. new lung cancer cases and look for any pattern (i.e., does the pattern in your graph even vaguely support your theory?). (Note that by constructing a graph and looking for a pattern to support your theory, you are engaging in inductive reasoning, or inference.) 4) Based on your graph, it should be uncertain which of two patterns is present in your data (i.e., a flat line, indicating no relationship, or an upward sloping line, indicating a relationship). Use Nested Modeling to determine the best operators for a model of your theory (by doing so, you determine which pattern is best supported by the data). 5) Use Excel to determine the best parameters for your model. (This should be a by-product of your Nested Modeling work.) 6) What does Statistical Hypothesis Testing (the t-stat and the Adjusted R-squared numbers) say about the quality of your model results? 7) Conduct a Sensitivity Analysis of your model with respect to the parameters. The model results (that is, Y) are most sensitive to estimates of which model parameter? 8) Derive two theorems from your theory. The theorems should answer the following questions: (a) If the air pollution index increases by 50 units, what is the increase in the number of new lung cancer cases? (b) If there were no air pollution in a city, what would be the baseline/background number of new lung cancer cases per year? 9) Suppose industry representatives say that it is smoking that is causing most of the new lung cancer cases. Describe a natural experiment that would allow you to test this claim. 10) By performing the analysis in this homework, have you proved your theory from question (1) above? If so, why? If not, why not (what other factors could explain the pattern observed in your data?)? 2