Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Learning Decision Trees from Data Streams Blaž Sovdat Artificial Intelligence Laboratory, Jožef Stefan Institute Problem Statement Our Solution Learning decision trees from high-speed data streams puts severe space and time constraints on the learning algorithm: Process examples sequentially, as fast as possible, using constant space, independent of the size of the stream. We partially solve the recomputation problem by proposing updating formulas and algorithms for information gain and Gini index, the most popular heuristic measures for classification. The most expensive operation in the Hoeffding tree algorithm is recomputation of attribute heuristic estimates [2]. Another important problem is handling of numeric attributes in regression, where memory-intensive BST technique [3] is used. Using formulas [1] we adapt histogram-based binarization [2] for regression trees — this approach is fast and uses constant space, but computes more crude splits and needs parameter tuning. Updating Formulas for Gini Index and Entropy Algorithm for Entropy (i) Let nk denote the number of examples from kth class, let n = n1 + . . . + nr , where r is the number of classes Let W := {} be a sliding window, let H := 0 be the current entropy, S be a data stream for x ∈ S do if |W | > w then Remove the oldest element x0, from k-th class, from the sliding window W Update H := Dec(h, n, nk ) Suppose x is from k-th class Add W := W ∪ {x} Update H := Inc(H, n, nk ) function Add(H, n, nk ) Update n := n + nk nk n−nk n−nk return n H − log2 n − n log2 nnk (ii) Let sk denote the number of newly arrived examples from kth class, let s = s1 + . . . + sr , and I be the set of counts that changed (iii) Let H 0 and H 00 denote entropies of counts for two different sets of classes, and let H be the entropy of the union (iv) Let G0 and G00 denote Gini indices of counts for two different sets of classes, and let H be the Gini index of the union Equation (1) gives defintions of entropy and Gini index as typically used by machine learning practitioners Definitions of Gini index and entropy r X nk 2 G=1− ( ) n H=− k=1 r X nk k=1 nk lg n n (1) Equation (2) gives updating formula for entropy if several counts increase; Equation (3) gives updating formula for the entropy of the union of counts of two different sets of classes, based on the two entropies Updating Formulas for Entropy X n nk + sk nk + sk nk nk n 0 H − lg − lg − lg (2) H= n+s n+s n+s n+s n+s n+s k∈I 0 0 00 00 n n n n 0 00 H= 0 H − lg 0 + 0 H − lg 0 (3) 00 00 00 00 n +n n +n n +n n +n function Del(H, n, nk ) Update n := n − nk nk nk n k return n+n H + log + log 2 n+nk 2 n+nk n n+nk function Inc(H, n, nk ) Update n := n + 1 and nk:= nk + 1 nk nk −1 nk nk −1 n−1 return n−1 H − log − log + log 2 n 2 n 2 n n n n Equation (4) gives updating formula for Gini index if several counts increase; Equation (5) gives updating formula for the Gini index of the union of counts of two different sets of classes, based on the two Gini indices Updating Formulas for Gini index ! X 1 2 0 2 G=1− n (1 − G ) (2nk sk + sk ) (4) 2 (n + s) k∈I 1 02 0 002 00 n (1 − G ) + n (1 − G ) (5) G=1− 0 00 2 (n + n ) function Dec(H, n, nk ) Update n := n − 1 and nk := nk − 1 nk +1 nk +1 nk nk n+1 n return n H + n+1 log2 n+1 − n+1 log2 n+1 +log2 n+1 Algorithm for Gini index Let W := {} be a sliding window, let G := 0 be the current Gini index, S be a data stream for x ∈ S do if |W | > w then Remove the oldest element x0, labeled with k-th class, from the sliding window W Update G := Dec(G, n, nk ) Add W := W ∪ {x} element labeled with k-th class Update G := Inc(G, n, nk ) function Add(G, n, nk ) Update n := n + nk 1 2 2 return 1 − 2 (n − nk ) (1 − G) + nk n function Del(G, n, nk ) Update n := n − nk 1 2 2 return 1 − 2 (n + nk ) (1 − G) − nk n function Inc(G, n, nk ) Update n := n + 1 in nk := nk + 1 1 2 return 1 − 2 (n − 1) (1 − G) + 2nk − 1 n function Dec(G, n, nk ) Update n := n − 1 in nk := nk − 1 1 2 return 1 − 2 (n + 1) (1 − G) − 2nk − 1 n 23.35 References Average target value [1] Tony F. Chan, Gene H. Golub, and Randall J. LeVeque. Updating formulae and a pairwise algorithm for computing sample variances. Technical report, Stanford, CA, USA, 1979 [2] Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’00, pages 71–80, New York, NY, USA, 2000. ACM [3] Elena Ikonomovska, João Gama, and Sašo Džeroski. Learning model trees from evolving data streams. Data Min. Knowl. Discov., 23(1):128–168, July 2011 [4] Donald E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming. Addison-Wesley, third edition, November 1997 [5] Blaž Sovdat and Zoran Bosnić. Incremental decision tree learning from time-changing data streams. 2014. Preprint Numeric Attribute Binarization for Regression 20.1 • Maintain a histogram of 100 bins in each leaf for each numeric attribute • Each bin is initialized with 100 unique attribute values v1 < v2 < . . . < v100 • When a new example arrives, update the bin closest to the attribute value 7.12 5.66 4.3 • Compute standard deviation reduction (SDR) of all 99 possible unions • Split on the value vi so that unions 1 to i and i + 1 to 100 maximize SDR v1 v2 v3 Bins v4 v5 • Compute SDR efficiently using updating formulas [1, 4] for mean and variance Split on v3 or v4 LATEX Tik Zposter