Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine Learning ● Artificial Intelligence ● Introduction to machine learning Learning decision trees (Sec 18.1 – 18.3) CS 4633/6633 Artificial Intelligence Commercial Niches of ML ● ● CS 4633/6633 Artificial Intelligence A model of learning agents (p. 526) Performance Standard Software applications we can’t program by hand Critic Data mining: using historical data to improve decisions ● Self-customizing programs feedback changes Learning Performance element element learning knowledge goals Problem generator effectors medical records => medical knowledge – newsreader that learns user interests CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence Learning functions ● ● ● “All learning can be seen as learning the representation of a function” (p. 529) A function is a mapping from inputs to outputs Examples of functions that can be learned: – given customer data, learn to predict credit risk – given medical records, learn to predict risk factors for disease – given image of road, learn to decide how to drive car CS 4633/6633 Artificial Intelligence sensors Types of training Types of feedback: ● ● ● correct answers: supervised learning occasional rewards: reinforcement learning no feedback: unsupervised learning On-line (incremental) versus off-line learning. CS 4633/6633 Artificial Intelligence Environment – autonomous driving – speech recognition ● Another approach to difficult programming problems is to write a program that learns how to solve the problem Robotics is an example where learning can be more effective than programming by hand ALVINN: A neural net system developed at Carnegie-Mellon Univ. learns how to drive a car by watching a human. Can drive on public highways at speeds up to 70 mph and for distances up to 90 miles at a time. Representations of knowledge ● ● ● ● ● ● ● ● numerical parameters decision trees formal grammars production rules logical theories graphs and networks frames and schemas computer programs (procedural encoding) CS 4633/6633 Artificial Intelligence Decision trees ● ● A decision tree takes as input a situation described by a set of attributes and returns a yes/no “decision”. A decision tree can represent any discrete-valued function (or more specifically, any propositional or Boolean function). It is a logical representation of a function. CS 4633/6633 Artificial Intelligence Example: Waiting for a table Attributes = properties used to describe examples ● Alternate ● Price ($, $$, $$$) ● Bar ● Raining ● Fri/Sat ● Reservation ● Hungry ● ● Patrons (None, Some, Full) Type (French, Italian, Thai, Burger) ● WaitEstimate (0-10, 10-30, 30-60, >60) CS 4633/6633 Artificial Intelligence Inductive learning (sec. 18.2) ● ● ● ● ● No Yes No Yes Yes Yes No No Yes Yes No CS 4633/6633 Artificial Intelligence Inducing decision trees from examples Supervised learning Simplest form is learning a function f() from examples Problem: given a training set of input/output examples of function f(), find a hypothesis h() closely approximates f(). Performance element = decision tree Learning element = decision tree learning algorithm CS 4633/6633 Artificial Intelligence Yes CS 4633/6633 Artificial Intelligence Yes Decision trees and propositional logic ● ● The function represented by a decision tree can also be represented by a set of if-then rules of propositional logic, in disjunctive normal form (Each path through tree represents a conjunction of propositions and the entire tree represents a disjunction of conjunctions of propositions) How many trees with N Boolean attributes? Problems Many hypotheses are consistent with the training set ● Simple hypotheses are preferred (Ockham’s razor) ● We want to find a hypothesis that is both consistent with the training set and simple ● 2N rows in a table specifies a function. N 22 such tables, and thus that many trees. CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence Constructing the decision tree Construct a root node that includes all the examples, then for each node: 1. if there are both positive and negative examples, choose the best attribute to split them. 2. if all the examples are pos (neg) answer Yes (No). 3. if there are no examples for a case (no observed examples) then choose a default based on the majority classification at the parent. 4. if there are no attributes left but we have both pos and neg examples, this means that the selected features are not sufficient for classification or that there is error in the examples. (can use majority vote). CS 4633/6633 Artificial Intelligence Splitting the examples +: X1, X3, X4, X6, X8, X12 −: X2, X5, X7, X9, X10, X11 Patrons? None +: −: X7, X11 Some +: X1, X3, X6, X8 −: Full +: X4, X12 −: X2, X5, X9, X10 CS 4633/6633 Artificial Intelligence Splitting examples cont. Splitting examples cont. +: X1, X3, X4, X6, X8, X12 −: X2, X5, X7, X9, X10, X11 +: X1, X3, X4, X6, X8, X12 −: X2, X5, X7, X9, X10, X11 Patrons? Type? None French +: X1 −: X5 Italian +: X6 −: X10 Thai +: X4, X8 −: X2, X11 CS 4633/6633 Artificial Intelligence Burger +: X3, X12 −: X7, X9 +: −: X7, X11 No Some +: X1, X3, X6, X8 −: Full +: X4, X12 −: X2, X5, X9, X10 Yes Hungry? Y +: X4, X12 CS 4633/6633 Artificial Intelligence −: X2, X10 N +: −: X5, X9 Decision tree learning algorithm ● Basic idea is to build the tree greedily. ● Choose “most significant attribute” to be the root. Then split the dataset in two halves, and recurse. ● No Yes No Define “significance” using information theory (based on information gain or “entropy”). Yes Yes No No CS 4633/6633 Artificial Intelligence Yes CS 4633/6633 Artificial Intelligence Typical learning curve Performance measurement How do we measure how close our hypothesis is to f()? ● Try h() on a test set ● Measure %correct predictions on the test set as a function of the size of the training set. “Learning curve.” ● CS 4633/6633 Artificial Intelligence CS 4633/6633 Artificial Intelligence Broadening the applicability ● Handling examples with missing data ● Handling multivalued attributes and classification ● Continuous-valued attributes CS 4633/6633 Artificial Intelligence Next class (sec 18.4) Information gain heuristic for choosing most significant attribute ● How to deal with noise in training data ● CS 4633/6633 Artificial Intelligence