Download rpart

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Decision Tree
rpart {rpart}
Fit a rpart model
• Usage
rpart(formula, data, …)
• Arguments
formula
data
: a formula, with a response but no interaction terms.
: an optional data frame in which to interpret the variables
named in the formula.
Decision Tree
>
>
>
>
>
>
install.packages("rpart")
install.packages("rpart.plot")
library(rpart)
library(rpart.plot)
data <- iris
head(data,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Species
1
5.1
3.5
1.4
0.2
setosa
2
4.9
3.0
1.4
0.2
setosa
3
4.7
3.2
1.3
0.2
setosa
4
4.6
3.1
1.5
0.2
Decision Tree
> data.ct <- rpart(Species ~ ., data=data)
> rpart.plot(data.ct, type = 1,extra=1, under=1)
Decision Tree
> data.ct <- rpart(Species ~ ., data=data)
> rpart.plot(data.ct,
type =but
1,extra=1,
Everything
“Species” under=1)
= Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
Decision Tree
> data.ct
n= 150
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)
2) Petal.Length< 2.45 50 0 setosa (1.00000000 0.00000000 0.00000000)
*
3) Petal.Length>=2.45 100 50 versicolor (0.00000000 0.50000000
0.50000000)
6) Petal.Width< 1.75 54 5 versicolor (0.00000000 0.90740741
0.09259259) *
7) Petal.Width>=1.75 46
1 virginica (0.00000000 0.02173913
0.97826087) *
>summary(data.ct)
Decision Tree
Node number 1: 150 observations, complexity param=0.5
predicted class=setosa expected loss=0.6666667 P(node) =1
class counts:
50
50
50
probabilities: 0.333 0.333 0.333
left son=2 (50 obs) right son=3 (100 obs)
Primary splits:
Petal.Length <
missing)
Petal.Width <
missing)
Sepal.Length <
missing)
Sepal.Width <
missing)
Surrogate splits:
Petal.Width <
Root node
2.45 to the left,
improve=50.00000, (0
0.8
to the left,
improve=50.00000, (0
5.45 to the left,
improve=34.16405, (0
3.35 to the right, improve=19.03851, (0
0.8
to the left,
agree=1.000, adj=1.00, (0
Decision Tree
Node number 1: 150 observations, complexity param=0.5
predicted class=setosa expected loss=0.6666667 P(node) =1
class counts:
50
50
50
probabilities: 0.333 0.333 0.333
left son=2 (50 obs) right son=3 (100 obs)
Primary splits:
Petal.Length <
missing)
Petal.Width <
missing)
Sepal.Length <
missing)
Sepal.Width <
missing)
Surrogate splits:
Petal.Width <
2.45 to the left,
improve=50.00000, (0
0.8
improve=50.00000, (0
to the left,
5.45 to the left,
improve=34.16405, (0
Decision rules
3.35 to the right, improve=19.03851, (0
0.8
to the left,
agree=1.000, adj=1.00, (0
Decision Tree
Node number 2: 50 observations
predicted class=setosa expected loss=0
class counts:
50
0
0
probabilities: 1.000 0.000 0.000
P(node) =0.3333333
Decision Tree
Node number 2: 50 observations
predicted class=setosa expected loss=0
class counts:
50
0
0
probabilities: 1.000 0.000 0.000
P(node) =0.3333333
Reaf
Decision Tree
Node number 2: 50 observations
predicted class=setosa expected loss=0
class counts:
50
0
0
probabilities: 1.000 0.000 0.000
P(node) =0.3333333
Node number 1
Node number 2
Random Forest
randomForest {randomForest}
randomForest implements Breiman's random forest algorithm (based on
Breiman and Cutler's original Fortran code) for classification and
regression. It can also be used in unsupervised mode for assessing
proximities among data points.
• Usage
randomForest(formula, data, mtry, …)
• Arguments
mtry
: number of variables randomly sampled as candidates at
each split.
Random Forest
>
>
>
>
install.packages("randomForest")
library(randomForest)
data <- iris
head(data,10)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Species
1
5.1
3.5
1.4
0.2
setosa
2
4.9
3.0
1.4
0.2
setosa
3
4.7
3.2
1.3
0.2
setosa
4
4.6
3.1
1.5
0.2
setosa
5
5.0
3.6
1.4
0.2
Random Forest
> tuneRF(data[,-5], data[,5], doBest=T)
Tuning random forest model (grid search)
Search for the optimal value of mtry(with respect to Out-of-Bag error estimate).
Random Forest
> data.rf <- randomForest(Species ~., data=data, mtry=4)
> data.rf
Call:
randomForest(formula = Species ~ ., data = data, mtry =
4)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 4
OOB estimate of error rate: 4.67%
Confusion matrix:
setosa versicolor virginica class.error
setosa
50
0
0
0.00
versicolor
0
47
3
0.06
virginica
0
4
46
0.08
Random Forest
> data.rf <- randomForest(Species ~., data=data, mtry=4)
> data.rf
Call:
randomForest(formula = Species ~ ., data = data, mtry =
4)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 4
OOB estimate of error rate: 4.67%
Confusion matrix:
setosa versicolor virginica class.error
setosa
50
0
0
0.00
versicolor
0
47
3
0.06
virginica
0
4
46
0.08
Random Forest
> data.rf <- randomForest(Species ~., data=data, mtry=4)
> data.rf
Call:
randomForest(formula = Species ~ ., data = data, mtry =
4)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 4
OOB estimate of error
rate:
4.67%forest
Result
of random
Confusion matrix:
setosa versicolor virginica class.error
setosa
50
0
0
0.00
versicolor
0
47
3
0.06
virginica
0
4
46
0.08