Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SUMMARY Decision trees have many uses: exploratory data analysis, variable selection, modeling and more. In today’s discussion we will cover: What are decision trees. Decision trees have many uses, are extremely versatile, easy to interpret, and require little data preparation. Decision tree packages in R. rpart (package used today), C50, Cubist Enhancing tree outputs. One of the attractive features of trees is that they are easy to interpret. However, in the rpart package the output could use a little enhancing. What are Trees Some packages in R Enhancing Tree Outputs References A decision tree is an algorithm the can have a continuous or categorical dependent (DV) and independent variables (IV). There are many advantages to using trees1. Simple to understand and interpret. People are able to understand decision tree models after a brief explanation. Requires little data preparation. Other techniques often require data normalisation, dummy variables need to be created and blank values to be removed. Able to handle both numerical and categorical data. Uses a white box model. If a given situation is observable in a model the explanation for the condition is easily explained by boolean logic Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model. Performs well with large data in a short time. Some things to consider when coding the model… Splits. Gini or information. Type of DV (method). Classification (class), regression (anova), count (poison), survival (exp). Minimum of observations for a split (minsplit). Minimum if observations in a node (minbucket). Cross validation (xval). Used more in model building rather than in exploration. Complexity parameter (Cp). This value is used for pruning. A smaller tree is perhaps less detailed, but with less error. What are Trees Some packages in R Enhancing Tree Outputs References R has many packages for similar/same endeavors. rpart. Comes with R. C50. Cubists. rpart.plot. Makes rpart plots much nicer. What are Trees Some packages in R Enhancing Tree Outputs References An alternative to the rpart plots is the prp function in the rpart.plot package. extras. Values 1~9 displays extra “stuff” boxcol. Define colors in the leafs. xflip. Rotate the tree 180o nn. Add node numbers for easier interpretation What are Trees Some packages in R Enhancing Tree Outputs References References 1. http://en.wikipedia.org/wiki/Decision_tree_learning 2. http://www.stanford.edu/class/stats315b/minitech.pdf 3. http://www.milbo.org/rpart-plot/prp.pdf