Machine Learning Mehdi Ghayoumi MSB rm 132 [email protected] Ofc hr: Thur, 11-12 a Machine Learning Machine Learning • “Learning denotes changes in a system that ... enable a system to do the same task more efficiently the next time.” –Herbert Simon • “Learning is constructing or modifying representations of what is being experienced.” • –Ryszard Michalski • “Learning is making useful changes in our minds.” • –Marvin Minsky Machine Learning • Decision Tree • Hunt and colleagues use exhaustive search decision-tree methods (CLS) to model human concept learning in the 1960’s. • In the late 70’s, Quinlan developed ID3 with the information gain heuristic to learn expert systems from examples. • Quinlan’s updated decision-tree package (C4.5) released in 1993. Machine Learning Classification: predict a categorical output from categorical and/or real inputs Decision trees are most popular data mining tool Easy to understand Easy to implement Easy to use Computationally cheap Machine Learning • Extremely popular method – Credit risk assessment – Medical diagnosis – Market analysis – Bioinformatics – Chemistry … Machine Learning Machine Learning • Internal decision nodes – Univariate: Uses a single attribute, xi – Multivariate: Uses all attributes, x • Leaves – Classification: Class labels, or proportions – Regression: Numeric; r average, or local fit • Learning is greedy; find the best split recursively Machine Learning • Occam’s razor: (year 1320) – Prefer the simplest hypothesis that fits the data. – The principle states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory. • Albert Einstein: Make everything as simple as possible, but not simpler. Why? – It’s a philosophical problem. – Simple explanation/classifiers are more robust – Simple classifiers are more understandable Machine Learning • Objective: Shorter trees are preferred over larger Trees • Idea: want attributes that classifies examples well. The best attribute is selected. Select attribute which partitions the learning set into subsets as “pure” as possible. Machine Learning Machine Learning Each branch corresponds to attribute value Each internal node has a splitting predicate Each leaf node assigns a classification Machine Learning • Entropy (disorder, impurity) of a set of examples, S, relative to a binary classification is: Entropy(S ) p1 log 2 ( p1 ) p0 log 2 ( p0 ) where p1 is the fraction of positive examples in S and p0 is the fraction of negatives. Machine Learning Entropy(S) 9 5 14 log( 9 log( 5 14 0.94 14 14 ) ) Machine Learning • If all examples are in one category, entropy is zero (we define 0log(0)=0) • If examples are equally mixed (p1=p0=0.5), entropy is a maximum of 1. • Entropy can be viewed as the number of bits required on average to encode the class of an example in S where data compression (e.g. Huffman coding) is used to give shorter codes to more likely cases. • For multi-class problems with c categories, entropy generalizes to: c Entropy( S ) pi log 2 ( pi ) i 1 Machine Learning Thank you!