Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Decision Tree rpart {rpart} Fit a rpart model • Usage rpart(formula, data, …) • Arguments formula data : a formula, with a response but no interaction terms. : an optional data frame in which to interpret the variables named in the formula. Decision Tree > > > > > > install.packages("rpart") install.packages("rpart.plot") library(rpart) library(rpart.plot) data <- iris head(data,10) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 Decision Tree > data.ct <- rpart(Species ~ ., data=data) > rpart.plot(data.ct, type = 1,extra=1, under=1) Decision Tree > data.ct <- rpart(Species ~ ., data=data) > rpart.plot(data.ct, type =but 1,extra=1, Everything “Species” under=1) = Sepal.Length + Sepal.Width + Petal.Length + Petal.Width Decision Tree > data.ct n= 150 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333) 2) Petal.Length< 2.45 50 0 setosa (1.00000000 0.00000000 0.00000000) * 3) Petal.Length>=2.45 100 50 versicolor (0.00000000 0.50000000 0.50000000) 6) Petal.Width< 1.75 54 5 versicolor (0.00000000 0.90740741 0.09259259) * 7) Petal.Width>=1.75 46 1 virginica (0.00000000 0.02173913 0.97826087) * >summary(data.ct) Decision Tree Node number 1: 150 observations, complexity param=0.5 predicted class=setosa expected loss=0.6666667 P(node) =1 class counts: 50 50 50 probabilities: 0.333 0.333 0.333 left son=2 (50 obs) right son=3 (100 obs) Primary splits: Petal.Length < missing) Petal.Width < missing) Sepal.Length < missing) Sepal.Width < missing) Surrogate splits: Petal.Width < Root node 2.45 to the left, improve=50.00000, (0 0.8 to the left, improve=50.00000, (0 5.45 to the left, improve=34.16405, (0 3.35 to the right, improve=19.03851, (0 0.8 to the left, agree=1.000, adj=1.00, (0 Decision Tree Node number 1: 150 observations, complexity param=0.5 predicted class=setosa expected loss=0.6666667 P(node) =1 class counts: 50 50 50 probabilities: 0.333 0.333 0.333 left son=2 (50 obs) right son=3 (100 obs) Primary splits: Petal.Length < missing) Petal.Width < missing) Sepal.Length < missing) Sepal.Width < missing) Surrogate splits: Petal.Width < 2.45 to the left, improve=50.00000, (0 0.8 improve=50.00000, (0 to the left, 5.45 to the left, improve=34.16405, (0 Decision rules 3.35 to the right, improve=19.03851, (0 0.8 to the left, agree=1.000, adj=1.00, (0 Decision Tree Node number 2: 50 observations predicted class=setosa expected loss=0 class counts: 50 0 0 probabilities: 1.000 0.000 0.000 P(node) =0.3333333 Decision Tree Node number 2: 50 observations predicted class=setosa expected loss=0 class counts: 50 0 0 probabilities: 1.000 0.000 0.000 P(node) =0.3333333 Reaf Decision Tree Node number 2: 50 observations predicted class=setosa expected loss=0 class counts: 50 0 0 probabilities: 1.000 0.000 0.000 P(node) =0.3333333 Node number 1 Node number 2 Random Forest randomForest {randomForest} randomForest implements Breiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification and regression. It can also be used in unsupervised mode for assessing proximities among data points. • Usage randomForest(formula, data, mtry, …) • Arguments mtry : number of variables randomly sampled as candidates at each split. Random Forest > > > > install.packages("randomForest") library(randomForest) data <- iris head(data,10) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 Random Forest > tuneRF(data[,-5], data[,5], doBest=T) Tuning random forest model (grid search) Search for the optimal value of mtry(with respect to Out-of-Bag error estimate). Random Forest > data.rf <- randomForest(Species ~., data=data, mtry=4) > data.rf Call: randomForest(formula = Species ~ ., data = data, mtry = 4) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB estimate of error rate: 4.67% Confusion matrix: setosa versicolor virginica class.error setosa 50 0 0 0.00 versicolor 0 47 3 0.06 virginica 0 4 46 0.08 Random Forest > data.rf <- randomForest(Species ~., data=data, mtry=4) > data.rf Call: randomForest(formula = Species ~ ., data = data, mtry = 4) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB estimate of error rate: 4.67% Confusion matrix: setosa versicolor virginica class.error setosa 50 0 0 0.00 versicolor 0 47 3 0.06 virginica 0 4 46 0.08 Random Forest > data.rf <- randomForest(Species ~., data=data, mtry=4) > data.rf Call: randomForest(formula = Species ~ ., data = data, mtry = 4) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4 OOB estimate of error rate: 4.67%forest Result of random Confusion matrix: setosa versicolor virginica class.error setosa 50 0 0 0.00 versicolor 0 47 3 0.06 virginica 0 4 46 0.08