Download Model Comparison - NCSU Statistics

Framingham data comparisons. (1) In a recent model comparison lecture, we looked at the 2 feature, generated neural network data that was used to show the neural net’s flexibility, and we compared 3 models. Here, we look at some real data, the famous Framingham heart study data. Using stepwise logistic regression we already found that the important predictors were age, smoke (yes,no) , SBP32 (systolic blood pressure), and Cholest2 (cholesterol). Get the data from your AAEM folder. Declare firstchd and smoker as binary. Set the 4 variables above to inputs, firstchd to target, and everything else to rejected. Nothing to write up for part (1). (2) Use a search engine (e.g. Google) to research and write a brief, one paragraph introduction to the Framingham study for a start to your report. Also research and write a second paragraph describing firstschd (first stage coronary heart disease). (3) Continue your report being sure to include these items: Prepare a diagram with 75% training and 25% validation data and fit a regression, a neural net, and a class probability tree. Be sure to mention what subtree assessment measure is used for this tree (i.e. for getting probability estimates). Describe any other properties that you choose (your option) to change from default in any of these model nodes. Look at the output of the neural network node. Did it converge? Did it use all 50 iterations? If not, how many did use? Connect these 3 models to a model comparison node. Under data selection and model selection, leave the entries as “default” which should be what they are to start with. When you do this, SAS/EM will list the measure used to select the model and the data set (training or validation) as the first of the various available model comparison statistics. Which statistic was used and what are its values for the three models? Note that the best of these models is listed first and a Y appears next to it (this means that model will be used for scoring a future data set). For your 3 models, what are the areas under the ROC curve (ROC index) and the lifts as reported in the model comparison output? Since the lift chart is a curve, not a number, it must have been evaluated at some depth (i.e. percentile). Looking at the results, report what that percentile is. For each of these other two assessment statistics, which model would have been chosen? (4) (perhaps an appendix to your report) Describe what happens if you change to a decision tree instead of a class probability tree by adjusting the model selection statistic property. Include a discussion of the spits that result. What will be our misclassification rate if we simply say that nobody will get first stage coronary heart disease? This may explain what you observed. Optional (not graded): Turning to the neural net, it appears from the iteration plot that the error function on the validation data set gets worse from the beginning. If, however, you turn off the default “preliminary training” as we did in the class demo, which iteration gives the best validation error function?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Model Comparison - NCSU Statistics