Download Learning Decision Trees from Data Streams Learning Decision

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Learning Decision Trees from Data Streams
Blaž Sovdat
Artificial Intelligence Laboratory, Jožef Stefan Institute
Problem Statement
Our Solution
Learning decision trees from high-speed data streams puts severe space and time constraints on the learning algorithm: Process examples sequentially, as fast as possible, using constant space, independent of the size of the stream.
We partially solve the recomputation problem by proposing updating formulas and algorithms for information gain
and Gini index, the most popular heuristic measures for classification.
The most expensive operation in the Hoeffding tree algorithm is recomputation of attribute heuristic estimates [2]. Another
important problem is handling of numeric attributes in regression, where memory-intensive BST technique [3] is used.
Using formulas [1] we adapt histogram-based binarization [2] for regression trees — this approach is fast and uses constant
space, but computes more crude splits and needs parameter tuning.
Updating Formulas for Gini Index and Entropy
Algorithm for Entropy
(i) Let nk denote the number of examples from kth class, let n = n1 + . . . + nr , where r is the number of classes
Let W := {} be a sliding window, let H := 0 be
the current entropy, S be a data stream
for x ∈ S do
if |W | > w then
Remove the oldest element x0, from k-th
class, from the sliding window W
Update H := Dec(h, n, nk )
Suppose x is from k-th class
Add W := W ∪ {x}
Update H := Inc(H, n, nk )
function Add(H, n, nk )
Update n := n + nk
nk
n−nk
n−nk
return n H − log2 n − n log2 nnk
(ii) Let sk denote the number of newly arrived examples from kth class, let s = s1 + . . . + sr , and I be the set of counts that changed
(iii) Let H 0 and H 00 denote entropies of counts for two different sets of classes, and let H be the entropy of the union
(iv) Let G0 and G00 denote Gini indices of counts for two different sets of classes, and let H be the Gini index of the union
Equation (1) gives defintions of entropy and Gini index as typically used by machine learning practitioners
Definitions of Gini index and entropy
r
X
nk 2
G=1−
( )
n
H=−
k=1
r
X
nk
k=1
nk
lg
n
n
(1)
Equation (2) gives updating formula for entropy if several counts increase; Equation (3) gives updating formula for the entropy of
the union of counts of two different sets of classes, based on the two entropies
Updating Formulas for Entropy
X
n
nk + sk nk + sk
nk
nk
n
0
H − lg
−
lg
−
lg
(2)
H=
n+s
n+s
n+s
n+s
n+s n+s
k∈I
0
0
00
00
n
n
n
n
0
00
H= 0
H − lg 0
+ 0
H − lg 0
(3)
00
00
00
00
n +n
n +n
n +n
n +n
function Del(H, n, nk )
Update n := n − nk
nk
nk
n
k
return n+n
H
+
log
+
log
2 n+nk
2 n+nk
n
n+nk
function Inc(H, n, nk )
Update n := n + 1 and nk:= nk + 1
nk
nk −1
nk
nk −1
n−1
return n−1
H
−
log
−
log
+
log
2 n
2 n
2 n
n
n
n
Equation (4) gives updating formula for Gini index if several counts increase; Equation (5) gives updating formula for the Gini index
of the union of counts of two different sets of classes, based on the two Gini indices
Updating Formulas for Gini index
!
X
1
2
0
2
G=1−
n (1 − G )
(2nk sk + sk )
(4)
2
(n + s)
k∈I
1
02
0
002
00
n (1 − G ) + n (1 − G )
(5)
G=1− 0
00
2
(n + n )
function Dec(H, n, nk )
Update n := n − 1 and nk := nk − 1
nk +1
nk +1
nk
nk
n+1
n
return n H + n+1 log2 n+1 − n+1 log2 n+1 +log2 n+1
Algorithm for Gini index
Let W := {} be a sliding window, let G := 0 be
the current Gini index, S be a data stream
for x ∈ S do
if |W | > w then
Remove the oldest element x0, labeled with k-th
class, from the sliding window W
Update G := Dec(G, n, nk )
Add W := W ∪ {x} element labeled with k-th class
Update G := Inc(G, n, nk )
function Add(G, n, nk )
Update n := n + nk
1
2
2
return 1 − 2 (n − nk ) (1 − G) + nk
n
function Del(G, n, nk )
Update n := n − nk
1
2
2
return 1 − 2 (n + nk ) (1 − G) − nk
n
function Inc(G, n, nk )
Update n := n + 1 in nk := nk + 1
1
2
return 1 − 2 (n − 1) (1 − G) + 2nk − 1
n
function Dec(G, n, nk )
Update n := n − 1 in nk := nk − 1
1
2
return 1 − 2 (n + 1) (1 − G) − 2nk − 1
n
23.35
References
Average target value
[1] Tony F. Chan, Gene H. Golub, and Randall J. LeVeque. Updating formulae and a pairwise algorithm for computing sample
variances. Technical report, Stanford, CA, USA, 1979
[2] Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international
conference on Knowledge discovery and data mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM
[3] Elena Ikonomovska, João Gama, and Sašo Džeroski. Learning model trees from evolving data streams. Data Min. Knowl.
Discov., 23(1):128–168, July 2011
[4] Donald E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming. Addison-Wesley, third edition,
November 1997
[5] Blaž Sovdat and Zoran Bosnić. Incremental decision tree learning from time-changing data streams. 2014. Preprint
Numeric Attribute Binarization for Regression
20.1
• Maintain a histogram of 100 bins in each leaf for each numeric attribute
• Each bin is initialized with 100 unique attribute values v1 < v2 < . . . < v100
• When a new example arrives, update the bin closest to the attribute value
7.12
5.66
4.3
• Compute standard deviation reduction (SDR) of all 99 possible unions
• Split on the value vi so that unions 1 to i and i + 1 to 100 maximize SDR
v1
v2
v3
Bins
v4
v5
• Compute SDR efficiently using updating formulas [1, 4] for mean and variance
Split on v3 or v4
LATEX Tik Zposter
Related documents