Download Artificial Intelligence Machine Learning Commercial Niches of ML A

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Natural computing wikipedia , lookup

Signals intelligence wikipedia , lookup

Pattern recognition wikipedia , lookup

Machine learning wikipedia , lookup

Artificial intelligence wikipedia , lookup

Transcript
Machine Learning
●
Artificial Intelligence
●
Introduction to machine learning
Learning decision trees
(Sec 18.1 – 18.3)
CS 4633/6633 Artificial Intelligence
Commercial Niches of ML
●
●
CS 4633/6633 Artificial Intelligence
A model of learning agents (p. 526)
Performance Standard
Software applications we can’t program by
hand
Critic
Data mining: using historical data to
improve decisions
●
Self-customizing programs
feedback
changes
Learning
Performance
element
element
learning
knowledge
goals
Problem
generator
effectors
medical records => medical knowledge
– newsreader that learns user interests
CS 4633/6633 Artificial Intelligence
CS 4633/6633 Artificial Intelligence
Learning functions
●
●
●
“All learning can be seen as learning the
representation of a function” (p. 529)
A function is a mapping from inputs to outputs
Examples of functions that can be learned:
– given customer data, learn to predict credit risk
– given medical records, learn to predict risk factors
for disease
– given image of road, learn to decide how to drive
car
CS 4633/6633 Artificial Intelligence
sensors
Types of training
Types of feedback:
●
●
●
correct answers: supervised learning
occasional rewards: reinforcement learning
no feedback: unsupervised learning
On-line (incremental) versus off-line
learning.
CS 4633/6633 Artificial Intelligence
Environment
– autonomous driving
– speech recognition
●
Another approach to difficult programming
problems is to write a program that learns
how to solve the problem
Robotics is an example where learning can
be more effective than programming by hand
ALVINN: A neural net system developed at
Carnegie-Mellon Univ. learns how to drive a
car by watching a human. Can drive on public
highways at speeds up to 70 mph and for
distances up to 90 miles at a time.
Representations of knowledge
●
●
●
●
●
●
●
●
numerical parameters
decision trees
formal grammars
production rules
logical theories
graphs and networks
frames and schemas
computer programs (procedural encoding)
CS 4633/6633 Artificial Intelligence
Decision trees
●
●
A decision tree takes as input a situation
described by a set of attributes and returns a
yes/no “decision”.
A decision tree can represent any discrete-valued
function (or more specifically, any propositional or
Boolean function). It is a logical representation of
a function.
CS 4633/6633 Artificial Intelligence
Example: Waiting for a table
Attributes = properties used to describe examples
●
Alternate
●
Price ($, $$, $$$)
●
Bar
●
Raining
●
Fri/Sat
●
Reservation
●
Hungry
●
●
Patrons (None, Some,
Full)
Type (French, Italian,
Thai, Burger)
●
WaitEstimate (0-10,
10-30, 30-60, >60)
CS 4633/6633 Artificial Intelligence
Inductive learning (sec. 18.2)
●
●
●
●
●
No
Yes
No
Yes
Yes
Yes
No
No
Yes
Yes
No
CS 4633/6633 Artificial Intelligence
Inducing decision trees from
examples
Supervised learning
Simplest form is learning a function f() from
examples
Problem: given a training set of input/output
examples of function f(), find a hypothesis h()
closely approximates f().
Performance element = decision tree
Learning element = decision tree learning
algorithm
CS 4633/6633 Artificial Intelligence
Yes
CS 4633/6633 Artificial Intelligence
Yes
Decision trees and
propositional logic
●
●
The function represented by a decision tree
can also be represented by a set of if-then
rules of propositional logic, in disjunctive
normal form (Each path through tree
represents a conjunction of propositions and
the entire tree represents a disjunction of
conjunctions of propositions)
How many trees with N Boolean attributes?
Problems
Many hypotheses are consistent with the
training set
● Simple hypotheses are preferred
(Ockham’s razor)
● We want to find a hypothesis that is both
consistent with the training set and simple
●
2N rows in a table specifies a function.
N
22 such tables, and thus that many trees.
CS 4633/6633 Artificial Intelligence
CS 4633/6633 Artificial Intelligence
Constructing the decision tree
Construct a root node that includes all the examples, then for
each node:
1. if there are both positive and negative examples, choose
the best attribute to split them.
2. if all the examples are pos (neg) answer Yes (No).
3. if there are no examples for a case (no observed
examples) then choose a default based on the majority
classification at the parent.
4. if there are no attributes left but we have both pos and
neg examples, this means that the selected features are
not sufficient for classification or that there is error in the
examples. (can use majority vote).
CS 4633/6633 Artificial Intelligence
Splitting the examples
+: X1, X3, X4, X6, X8, X12
−: X2, X5, X7, X9, X10, X11
Patrons?
None
+:
−: X7, X11
Some
+: X1, X3, X6, X8
−:
Full
+: X4, X12
−: X2, X5, X9, X10
CS 4633/6633 Artificial Intelligence
Splitting examples cont.
Splitting examples cont.
+: X1, X3, X4, X6, X8, X12
−: X2, X5, X7, X9, X10, X11
+: X1, X3, X4, X6, X8, X12
−: X2, X5, X7, X9, X10, X11
Patrons?
Type?
None
French
+: X1
−: X5
Italian
+: X6
−: X10
Thai
+: X4, X8
−: X2, X11
CS 4633/6633 Artificial Intelligence
Burger
+: X3, X12
−: X7, X9
+:
−: X7, X11
No
Some
+: X1, X3, X6, X8
−:
Full
+: X4, X12
−: X2, X5, X9, X10
Yes
Hungry?
Y
+: X4, X12
CS 4633/6633 Artificial Intelligence −: X2, X10
N
+:
−: X5, X9
Decision tree learning algorithm
●
Basic idea is to build the tree greedily.
●
Choose “most significant attribute” to be the
root. Then split the dataset in two halves,
and recurse.
●
No
Yes
No
Define “significance” using information
theory (based on information gain or
“entropy”).
Yes
Yes
No
No
CS 4633/6633 Artificial Intelligence
Yes
CS 4633/6633 Artificial Intelligence
Typical learning curve
Performance measurement
How do we measure how close our
hypothesis is to f()?
● Try h() on a test set
● Measure %correct predictions on the test
set as a function of the size of the training
set.
“Learning curve.”
●
CS 4633/6633 Artificial Intelligence
CS 4633/6633 Artificial Intelligence
Broadening the applicability
●
Handling examples with missing data
●
Handling multivalued attributes and
classification
●
Continuous-valued attributes
CS 4633/6633 Artificial Intelligence
Next class (sec 18.4)
Information gain heuristic for choosing most
significant attribute
● How to deal with noise in training data
●
CS 4633/6633 Artificial Intelligence