Download Bayesian Classification and Regression Tree Analysis (CART)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Transcript
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Bayesian Classification and Regression Tree
Analysis (CART)
Teresa Jacobson
Department of Applied Mathematics and Statistics
Jack Baskin School of Engineering
UC Santa Cruz
March 11, 2010
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Introduction
Bayesian Model
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Example
Extensions and future work
Bibliography
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
What is CART?
The general aim of classification and regression tree analysis: given
a set of observations yi and associated variables xij , i = 1 : n and
j = 1 : p, find a way of using x to partition the observations into
homogeneously distributed groups, then use the group to predict y .
Use binary trees to recursively split observations with yes/no
questions about variables in x. Assume each end or terminal node
has a homogeneous distribution.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
How do we do this?
Seminal work by Breiman et al[1] was surprisingly Bayesian,
involving the elicitation of priors and risk/utility functions on
misclassification. However, the actual tree generation methods
were still very ad-hoc.
After this work was published a large number of different ad-hoc
methods appear, as well as attempts to combine them to produce
better inferential strategies. Methods are largely deterministic in
nature and produce one tree per method.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Going Bayesian: The Problem
!
p
=?
1
1
Image courtesy of Diesel-stock, Diesel-stock.deviantart.com.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Notation
Notation follows that of Wu, Tjelmeland and West (WTW)[7].
Observations yi , “regressors” xi , i ∈ I = {1 : n}, j ∈ 1 : k. We
wish to predict y ∈ Y based associated x ∈ X = X1 × · · · × Xk .
Nodes u with the root note denoted as node 0 and each
non-terminal node u with children nodes 2u + 1 (left) and 2u + 2
(right). Trees are then defined as appropriate subsets of the set
N = {0, 1, 2, . . . }. Write the number of nodes of a tree T as
m(T ).
Splitting: For each node U: Choose a predictor variable index
kT (u) and a splitting threshold τT (u) ∈ XkT (u) . We then assign y
to the left child of u if xkT (u) ≤ τT (u).
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Example tree
Example tree from iris data
height=4, log(p)=134.866
Petal.Width <> 1.5
Sepal.Length <> 6.2
Petal.Width <> 0.6
1
●
Sepal.Length <> 5.9
8e−04
30 obs
2
●
3
●
0.0045
17 obs
0.0023
13 obs
Jacobson
4
●
5
●
0.0017
11 obs
0.0086
19 obs
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Likelihood
Each terminal node (leaf) viewed as a random sample from some
distribution with density φ(·|θu ) where θu is dependently only on
the leaf.
Usually φ is either multinomial (categorical outcomes) or normal
(continuous outcomes).
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Tree prior
Simplify by using a prior of the form
p(Θ, T ) = p(Θ|T )p(T )
and Chipman, George, and McCulloch (CGM) specify p(T )
implicitly by using a tree-generating process:
1. Begin by setting T to be the trivial one-node tree
2. Split a node with probability psplit (u, T )
3. If a node splits, assign a splitting rule τT (u) according to
some distribution p(τT (u)|u, T ). Update T to reflect the new
tree, and repeat steps 2 and 3.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Tree prior (cont.)
Consider
psplit (u, T ) = α(1 + du )−β ,
β ≥ 0; 0 ≤ α ≤ 1
where dn is the node depth. Consider finite splitting values.
Suggestion: choose k uniformly from available predictors and then
τ from the set of observed values if xk is quantitative or from the
available subsets if qualitative.
For Θ, use iid normal-inverse-gamma for Θ|T if constructing a
regression tree and Dirichlet if constructing a classification tree.
CGM suggest choosing hyperparameters based on fitting a greedy
tree model.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Fitting procedure
Proceed through MCMC. Interest focuses on the steps for
sampling the tree structure. CGM use a Metropolis-Hastings step
with a transition kernel choosing randomly among four steps:
I
Grow: Pick a terminal node and split into two children nodes,
I
Prune: Pick a parent of two terminal nodes and collapse,
I
Change: Pick an internal node and reassign the splitting rule,
I
Swap: Pick a parent-child pair and swap splitting rules, unless
the other child of the parent has the same pair, in which case
give both children the splitting rule of the parent.
All steps are reversible, so the Markov chain is reversible.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Limitations
I
Relatively slow mixing: tendency to stay in local area
I
Tendency to get “stuck” in a local mode: CGM suggest
repeated restarting either from trivial tree or trees found by
other methods such as bootstrap bumping
I
No single tree output; no good way of picking one “good”
tree from sample
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
WTW propose two significant improvements to CGM’s method:
I
Improved prior on tree structure: the “pinball prior”,
I
New M-H method, “tree restructure” move.
They also allow for infinite splitting moves, via a prior on the space
of splitting values. A prior with finite point masses would duplicate
that of CGM as a special case.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Pinball prior
Idea: generate some number of terminal nodes m(T ), then
“cascade” these nodes down the tree, randomly splitting left/right
with some probability until nodes define individual leaves.
I
Specify prior density for tree size, m(T ) ∼ α(m(T )). Natural:
Poisson, m(T ) = 1 + Pois(λ) for some specified λ.
I
Construct a prior density for splitting, β(ml(u) (T )|mu (T )),
where ml(u) (T ) is the number sent left from some number
mu (T ) that have cascaded down to node u. There are a
number of choices for β, e.g. uniform or binomial.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Chipman, George, and McCulloch (CGM)
Wu, Tjelmeland, and West (WTW)
Tree restructure move
Idea: Restructure the tree branches without changing the terminal
categories.
I
Begin at node 0
I
Recursively identify possible splitting rules that leave terminal
categories unchanged
I
Choose some splitting rule, repeat until terminal nodes fully
specified
This move radically restructures the tree without affecting
categorization and eliminates the tendency to get stuck near local
maxima: effective exploration of posterior → better mixing, better
posterior inference.
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Example
Iris data: We wish to use sepal length and petal width to predict
petal length. Divide data into two sets: 30 of each species for tree
creation, 20 for evaluation.
> iris.subsample.index <- c(sample(1:50, 30), sample(51:100, 30),
sample(101:150, 30))
> iris.train <- iris[iris.subsample.index,]
> iris.test <- iris[-iris.subsample.index,]
Iris petal length
5
Testing
Petal.Width
2.5
6
Training
7
7
6
2.0
5
1.5
4
1.0
3
2
0.5
1
5
6
7
Sepal.Length
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Example (Cont.)
Using bcart in the tgb package:
> bcart.iris <- bcart(X = iris.train[,c(1,4)], XX = iris.test[,c(1,4)],
Z = iris.train[,3], trace = TRUE, R=5, BTE = c(2000, 10000, 2))
z quantile diff (error)
2.5
z mean
●
2.0
●
●●
●
●●
● ● ● ●●● ●
●
1.5
0.5
dth
Wi
Jacobson
●
●
●
●
●●
●
●
●
● ●● ● ●
●●
●
●●● ●
●
● ●● ● ●
●
●
●
●
●●
●
●●●
1.0
tal.
Pe
Petal.Width
z
h
ngt
.Le
pal
●
●
●
●
Se
●
●
●
●
●
●●●
●
●●
● ●● ●
●
●
●●
●
● ●●
●●
● ●● ●
●
●
●
●
●●
●
● ● ● ●●
● ●●●
●●●●
●●
●●●●
●
●
●
●●
●
●
●
4.5 5.0 5.5 6.0 6.5 7.0 7.5
Bayesian CART
●
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
height=3, log(p)=118.534
height=4, log(p)=134.866
Petal.Width <> 1.5
height=5, log(p)=104.631
Petal.Width <> 1.5
Petal.Width <> 1.5
Sepal.Length <> 5.9
Petal.Width <> 0.6
Sepal.Length <> 6.5
Sepal.Length <> 5.5
1
●
4
●
5
●
8e−04
0.0017
0.0086
30 obs
11 obs
19 obs
Sepal.Length <> 5.9
Petal.Width <> 0.3
3
●
0.0316
12 obs
1
●
2
●
3
●
4
●
2
●
3
●
1
●
2
●
9e−04
0.0064
0.0028
0.0063
0.0045
0.0023
8e−04
0.0211
30 obs
30 obs
17 obs
13 obs
17 obs
13 obs
24 obs
11 obs
Jacobson
Sepal.Length <> 6.2
Sepal.Length <> 6.2
Petal.Width <> 0.6
Bayesian CART
5
●
6
●
0.0022
0.002
0.0098
13 obs
4
●
11 obs
19 obs
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
6
●
setosa
versicolor
virginica
●
●
3
4
5
●●
● ●●
●
● ●
●●
●
●● ●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
2
Observed petal length
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1
2
3
4
5
●●
●
●● ●
●
●
●
● ●
● ●●
●
●
●
●● ●
●
●
●
●●
●
●
● ●
●
● ●
●
●
●
●
1
Observed petal length
6
●
●
●
●
●
●
●
●
●
●
●●
7
Testing data
7
Training data
2
3
4
5
6
Predicted petal length
●
●
●
●
●
●
●
●
●
2
3
4
5
Predicted petal length
Jacobson
Bayesian CART
6
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Extensions and Future Work
I
Implementation!
I
Inference methods: tree averaging
Beyond the Gaussian
I
I
I
Heavy-tailed distributions
Skew and count data
I
Improved priors
I
Improved sampling steps
Jacobson
Bayesian CART
Outline
Introduction
Bayesian Model
Example
Extensions and future work
Bibliography
Bibliography
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone.
Classification and Regression Trees.
Wadsworth Statistics/Probability Series. Wadsworth International Group, 1984.
Hugh A. Chipman, Edward I. George, and Robert E. McCulloch.
Bayesian cart model search.
Journal of the American Statistical Association, 93(443):935–960, September 1998.
Hugh A. Chipman, Edward I. George, and Robert E. McCulloch.
Hierarchical priors for bayesian cart shrinkage.
Statistics and Computing, 10:17–24, 2000.
Hugh A. Chipman, Edward I. George, and Robert E. McCulloch.
Bayesian treed models.
Machine Learning, 48:299–320, 2002.
David G. T. Denison, Bani K. Mallick, and Adrian F. M. Smith.
A bayesian cart algorithm.
Biometrika, 85(2):363–377, June 1998.
Wei-Yin Loh.
Classification and regression tree methods.
In Ruggeri, Kenett, and Faltin, editors, Encyclopedia of Statistics in Quality and Reliability, pages 315–323. Wiley, 2008.
Yuhong Wu, Håkon Tjelmeland, and Mike West.
Bayesan cart - prior specification and posterior simulation -.
January 2006.
Jacobson
Bayesian CART