* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Introduction to Machine Learning
Technological singularity wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Pattern recognition wikipedia , lookup
Machine learning wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Intelligence explosion wikipedia , lookup
Concept learning wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Machine Learning
Foundations of Artificial Intelligence
Learning
What is Learning?
Learning in AI is also called machine learning or pattern recognition.
The basic objective is to allow an intelligent agent to discover autonomously
knowledge from experience.
Let’s examine the definition more closely:
“an intelligent agent”: The ability to learn requires a prior level of intelligence and
knowledge. Learning has to start from an existing level of capability.
“to discover autonomously”: Learning is fundamentally about an agent
recognizing new facts for its own use and acquiring new abilities that reinforce its
own existing abilities. Literal programming, i.e. rote learning from instruction, is
not useful.
“knowledge”: Whatever is learned has to be represented in some way that the
agent can use. “If you can't represent it, you can't learn it” is a corollary of the
slogan “Knowledge is power”.
“from experience”: Experience is typically a set of so-called training examples;
examples may be categorized or not. They may be random or selected by a
teacher. They may include explanations or not.
Foundations of Artificial Intelligence
2
Learning Agent
sensors
?
environment
agent
actuators
Critic
Learning
element
Percepts
KB
Problem
solver
Actions
Foundations of Artificial Intelligence
3
Learning element
Design of a learning element is affected by
Which components of the performance element are to be learned
What feedback is available to learn these components
What representation is used for the components
Type of feedback:
Supervised learning: correct answers for each training example
Unsupervised learning: correct answers not given
Reinforcement learning: occasional rewards/feedback
Foundations of Artificial Intelligence
4
Inductive Learning
Inductive Learning
inductive learning involves learning generalized rules from specific examples (can think of
this as the “inverse” of deduction)
main task: given a set of examples, each classified as positive or negative produce a concept
description that matches exactly the positive examples
Some Notes:
The examples are coded in some representation language, e.g. they are coded by a finite set
of real-valued features.
The concept description is in a certain language that is presumably a superset of the language
of possible example encodings.
A “correct” concept description is one that classifies correctly ALL possible examples, not
just those given in the training set.
Fundamental Difficulties with Induction
can’t generalize with perfect certainty
examples and concepts are NOT available “directly”; they are only available through
representations which may be more or less adequate to capture them
some examples may be classified as both positive and negative
the features supplied may not be sufficient to discriminate between positive and negative
examples
Foundations of Artificial Intelligence
5
Inductive Learning Frameworks
1.
Function-learning formulation
2. Logic-inference formulation
Foundations of Artificial Intelligence
6
Inductive learning
Simplest form: learn a function from examples
f is the target function
An example is a pair (x, f(x))
Problem: find a hypothesis h
such that h ≈ f
given a training set of examples
This is a highly simplified model of real learning:
Ignores prior knowledge
Assumes examples are given
Foundations of Artificial Intelligence
7
Inductive learning
Construct/adjust h to agree with f on training set
h is consistent if it agrees with f on all examples
E.g., curve fitting:
Foundations of Artificial Intelligence
8
Inductive learning
Construct/adjust h to agree with f on training set
h is consistent if it agrees with f on all examples
E.g., curve fitting:
Foundations of Artificial Intelligence
9
Inductive learning
Construct/adjust h to agree with f on training set
h is consistent if it agrees with f on all examples
E.g., curve fitting:
Foundations of Artificial Intelligence
10
Inductive learning
Construct/adjust h to agree with f on training set
h is consistent if it agrees with f on all examples
E.g., curve fitting:
Foundations of Artificial Intelligence
11
Inductive learning
Construct/adjust h to agree with f on training set
h is consistent if it agrees with f on all examples
E.g., curve fitting:
Foundations of Artificial Intelligence
12
Inductive learning
Construct/adjust h to agree with f on training set
h is consistent if it agrees with f on all examples
E.g., curve fitting:
Ockham’s razor: prefer the simplest hypothesis consistent
with data
Foundations of Artificial Intelligence
13
Logic-Inference Formulation
Background knowledge KB
Training set D (observed knowledge)
that is not logically implied by KB
Inductive inference:
Find h (inductive hypothesis) such that KB and h imply
D
h = D is a trivial, but
uninteresting solution
(data caching)
Usually, not a sound inference
Foundations of Artificial Intelligence
14
Rewarded Card Example
Deck of cards, with each card designated by [r,s], its rank and
suit, and some cards “rewarded”
Background knowledge KB:
((r=1) v … v (r=10)) NUM(r)
((r=J) v (r=Q) v (r=K)) FACE(r)
((s=S) v (s=C)) BLACK(s)
((s=D) v (s=H)) RED(s)
Training set D:
REWARD([4,C]) REWARD([7,C]) REWARD([2,S])
REWARD([5,H]) REWARD([J,S])
Possible inductive hypothesis:
h (NUM(r) BLACK(s) REWARD([r,s]))
Note: There are several possible inductive hypotheses
Foundations of Artificial Intelligence
15
Learning a Predicate
Set E of objects (e.g., cards)
Goal predicate CONCEPT(x), where x is an object in E,
takes the value True or False (e.g., REWARD)
Observable predicates A(x), B(X), …
e.g., NUM, RED
Training set
values of CONCEPT for some combinations of values of the observable
predicates
Foundations of Artificial Intelligence
16
A Possible Training Set
Ex. #
A
B
C
D
E
CONCEPT
1
True
True
False
True
False
False
2
True
False
False
False
False
True
3
False
False
True
True
True
False
4
True
True
True
False
True
True
5
False
True
True
False
False
False
6
True
True
False
True
True
False
7
False
False
True
False
True
False
8
True
False
True
False
True
True
9
False
False
False
True
True
False
10
True
True
True
True
False
True
Note that the training set does not say whether
an observable predicate A, …, E is pertinent or not
Foundations of Artificial Intelligence
17
Learning a Predicate
Set E of objects (e.g., cards)
Goal predicate CONCEPT(x), where x is an object in E,
takes the value True or False (e.g., REWARD)
Observable predicates A(x), B(X), …
e.g., NUM, RED
Training set
values of CONCEPT for some combinations of values of the observable predicates
Find a representation of CONCEPT in the form:
CONCEPT(x) S(A,B, …)
where S(A,B,…) is a sentence built with the observable predicates, e.g.:
CONCEPT(x) A(x) (B(x) v C(x))
Foundations of Artificial Intelligence
18
Example set
An example consists of the values of CONCEPT and the
observable predicates for some object x
A example is positive if CONCEPT is True, else it is
negative
The set X of all examples is the example set
The training set is a subset of X
Foundations of Artificial Intelligence
19
Hypothesis Space
An hypothesis is any sentence h of the form:
CONCEPT(x) S(A,B, …)
where S(A,B,…) is a sentence built with the observable
predicates
The set of all hypotheses is called the hypothesis space H
An hypothesis h agrees with an example if it gives the
correct value of CONCEPT
Foundations of Artificial Intelligence
20
Inductive Learning Scheme
Training set D
-
+
+
+
-
-
+
+
+
- -
+
-+
+
+
-
+
-
+
Example set X
{[A, B, …, CONCEPT]}
Foundations of Artificial Intelligence
Inductive
hypothesis h
Hypothesis space H
{[CONCEPT(x) S(A,B, …)]}
21
Size of Hypothesis Space
n observable predicates
2n entries in truth table
n
2
In the absence of any restriction (bias), there are 2
hypotheses to choose from
n = 6 2x1019 hypotheses!
Foundations of Artificial Intelligence
22
Rewarded
Card Example
Multiple
Inductive
Hypotheses
Rewarded
Card Example
(Continued)
Background knowledge KB:
((r=1) v … v (r=10)) NUM([r,s])
((r=J) v (r=Q) v (r=K)) FACE([r,s])
((s=S) v (s=C)) BLACK([r,s])
((s=D) v (s=H)) RED([r,s])
Training set D:
REWARD([4,C]) REWARD([7,C]) REWARD([2,S])
REWARD([5,H])
REWARD([J,S])
Possible inductive hypothesis:
h (NUM(x) BLACK(x) REWARD(x))
h1 NUM(x) BLACK(x) REWARD(x)
h2 BLACK([r,s]) (r=J) REWARD([r,s])
h3 ([r,s]=[4,C]) ([r,s]=[7,C]) [r,s]=[2,S]) REWARD([r,s])
h4 ([r,s]=[5,H]) ([r,s]=[J,S]) REWARD([r,s])
agree with all the examples in the training set
Foundations of Artificial Intelligence
23
Inductive Bias
Need for a system of preferences – called a bias – to
compare possible hypotheses
Keep-It-Simple (KIS) Bias
If an hypothesis is too complex it may not be worth learning it
There are much fewer simple hypotheses than complex ones, hence the
hypothesis space is smaller
Examples:
Use much fewer observable predicates than suggested by the training set
Constrain the learnt predicate, e.g., to use only “high-level” observable
predicates such as NUM, FACE, BLACK, and RED and/or to have simple
syntax (e.g., conjunction of literals)
If the bias allows only sentences S that are conjunctions of k << n predicates
picked from the n observable predicates, then the size of H is O(nk)
Foundations of Artificial Intelligence
24
Version Spaces
Idea: assume you are looking for a CONJUNCTIVE CONCEPT
e.g.,
spade A, club 7, club 9
club 8, heart 5
concept: odd and black
yes
no
now notice that the set of conjunctive concepts is partially ordered by
specificity
any card
at any point, keep most specific and least specific
conjuncts consistent with data:
black
most specific:
• anything more specific misses some positive instances
• always exists -- conjoin all OK conjunctions
odd black
least specific:
• anything less specific admits some negative instances
• may not be unique -- imagine all you know is club
4 not ok, odd black ok, spade ok, black not ok
Idea is to gradually merge least and most specific as
data comes in.
Foundations of Artificial Intelligence
spade
odd spade
3 of spade
25
Version Spaces: Example
The training examples
(obtained) incrementally:
Card
In Target Set?
A-ª
yes
7-§
yes
8-©
no
9-§
yes
5-©
no
K-¨
no
6-¨
no
7-ª
yes
Foundations of Artificial Intelligence
Step 0: most specific concept (msc) is the
empty set; least specific concept (lsc) is the
set of all cards.
Step 1: A-spade is found to be in target set:
msc = {A-spade}
lsc = set of all cards
Step 2: 7-club is found to be in target set:
msc = odd black cards
lsc = set of all cards
Step 3: 8-heart is not in target set
msc = odd black cards
lsc = all odd cards OR all black cards
...
26
Predicate as a Decision Tree
The predicate CONCEPT(x) A(x) (B(x) v C(x)) can
be represented by the following decision tree:
Example:
A?
A mushroom is poisonous iff
True
it is yellow and small, or yellow,
big and spotted
B?
• x is a mushroom
False
True
• CONCEPT = POISONOUS
• A = YELLOW
True
• B = BIG
C?
• C = SPOTTED
True
False
True
Foundations of Artificial Intelligence
False
False
False
27
Decision Trees
What is a Decision Tree
it takes as input the description of a situation as a set of attributes
(features) and outputs a yes/no decision (so it represents a Boolean
function)
each leaf is labeled "positive” or "negative", each node is labeled with
an attribute (or feature), and each edge is labeled with a value for the
feature of its parent node
Attribute-value language for examples
in many inductive tasks, especially learning decision trees, we need a
representation language for examples
each example is a finite feature vector
a concept is a decision tree where nodes are features
Foundations of Artificial Intelligence
28
Decision Trees
Example: “is it a good day to play golf?”
a set of attributes and their possible values:
outlook
sunny, overcast, rain
temperature
cool, mild, hot
humidity
high, normal
windy
true, false
A particular instance in the
training set might be:
<overcast, hot, normal, false>: play
In this case, the target class
is a binary attribute, so each
instance represents a positive
or a negative example.
Foundations of Artificial Intelligence
29
Using Decision Trees for Classification
Examples can be classified as follows
1. look at the example's value for the feature specified
2. move along the edge labeled with this value
3. if you reach a leaf, return the label of the leaf
4. otherwise, repeat from step 1
Example (a decision tree to decide whether to go play golf):
outlook
sunny
no
Foundations of Artificial Intelligence
rain
yes
humidity
high
overcast
windy
normal
yes
true
no
false
yes
30
Classification: 3 Step Process
1. Model construction (Learning):
Each record (instance) is assumed to belong to a predefined class, as
determined by one of the attributes, called the class label
The set of records used for construction of the model is called training set
The model is usually represented in the form of classification rules, (IFTHEN statements) or decision trees
2. Model Evaluation (Accuracy):
Estimate accuracy rate of the model based on a test set
The known label of test sample is compared to classified result from model
Accuracy rate: percentage of test set samples correctly classified by the
model
Test set is independent of training set otherwise over-fitting will occur
3. Model Use (Classification):
The model is used to classify unseen instances (assigning class labels)
Predict the value of an actual attribute
Foundations of Artificial Intelligence
31
Memory-Based Reasoning
Basic Idea: classify new instances based on their similarity to
instances we have seen before
also called “instance-based learning”
Simplest form of MBR: Rote Learning
learning by memorization
save all previously encountered instance; given a new instance, find one from
the memorized set that most closely “resembles” the new one; assign new
instance to the same class as the “nearest neighbor”
more general methods try to find k nearest neighbors rather than just one
but, how do we define “resembles?”
MBR is “lazy”
defers all of the real work until new instance is obtained; no attempts are made
to learn a generalized model from the training set
less data preprocessing and model evaluation, but more work has to be done at
classification time
Foundations of Artificial Intelligence
32
MBR & Collaborative Filtering
Collaborative Filtering or “Social Learning”
idea is to give recommendations to a user based on the “ratings” of objects by
other users
usually assumes that features in the data are similar objects (e.g., Web pages,
music, movies, etc.)
usually requires “explicit” ratings of objects by users based on a rating scale
there have been some attempts to obtain ratings implicitly based on user behavior
(mixed results; problem is that implicit ratings are often binary)
Nearest Neighbors Strategy:
Find similar users and predicted (weighted) average of user ratings
We can use any distance or similarity measure to compute similarity
among users (user ratings on items viewed as a vector)
In case of ratings, often the Pearson r algorithm is used to compute
correlations
Foundations of Artificial Intelligence
33
MBR & Collaborative Filtering
Collaborative Filtering Example
A movie rating system
Ratings scale: 1 = “detest”; 7 = “love it”
Historical DB of users includes ratings of movies by Sally, Bob, Chris, and Lynn
Karen is a new user who has rated 3 movies, but has not yet seen “Independence
Day”; should we recommend it to her?
Star Wars
Jurassic Park
Terminator II
Independence Day
Sally
7
6
3
7
Bob
7
4
4
6
Chris
3
7
7
2
Lynn
4
4
6
2
Karen
7
4
3
?
Will Karen like “Independence Day?”
Foundations of Artificial Intelligence
34
Clustering
Clustering is a process of partitioning a set of data (or objects) in a
set of meaningful sub-classes, called clusters
Helps users understand the natural grouping or structure in a data set
Cluster:
a collection of data objects that are
“similar” to one another and thus
can be treated collectively as one
group
but as a collection, they are
sufficiently different from other
groups
Clustering
unsupervised classification
no predefined classes
Foundations of Artificial Intelligence
35
Distance or Similarity Measures
Measuring Distance
In order to group similar items, we need a way to measure the distance
between objects (e.g., records)
Note: distance = inverse of similarity
Often based on the representation of objects as “feature vectors”
An Employee DB
ID
1
2
3
4
5
Gender
F
M
M
F
M
Age
27
51
52
33
45
Foundations of Artificial Intelligence
Salary
19,000
64,000
100,000
55,000
45,000
Term Frequencies for Documents
Doc1
Doc2
Doc3
Doc4
Doc5
T1
0
3
3
0
2
T2
4
1
0
1
2
T3
0
4
0
0
2
T4
0
3
0
3
3
T5
0
1
3
0
1
T6
2
2
0
0
4
36
Distance or Similarity Measures
Common Distance Measures:
Manhattan distance:
Euclidean distance:
Cosine similarity:
dist ( X , Y ) 1 sim( X , Y )
sim( X , Y )
( xi yi )
i
xi yi
2
i
Foundations of Artificial Intelligence
2
i
37
What Is Good Clustering?
A good clustering will produce high quality clusters in
which:
the intra-class (that is, intra-cluster) similarity is high
the inter-class similarity is low
The quality of a clustering result also depends on both
the similarity measure used by the method and its
implementation
The quality of a clustering method is also measured by
its ability to discover some or all of the hidden patterns
The quality of a clustering result also depends on the
definition and representation of cluster chosen
Foundations of Artificial Intelligence
38
Applications of Clustering
Clustering has wide applications in Pattern Recognition
Spatial Data Analysis:
create thematic maps in GIS by clustering feature spaces
detect spatial clusters and explain them in spatial data mining
Image Processing
Market Research
Information Retrieval
Document or term categorization
Information visualization and IR interfaces
Web Mining
Cluster Web usage data to discover groups of similar access patterns
Web Personalization
Foundations of Artificial Intelligence
39
Learning by Discovery
One example: AM by Doug Lenat at Stanford
a mathematical system
inputs: set theory (union, intersection, etc); “how to do mathematics” (based on a book by
Polya), e.g., if f is an interesting function of two arguments, then f(x,x) is an interesting
function on one, etc.
speculated about what was interesting an made conjectures, etc.
What AM discovered
integers (as equivalence relation on cardinality of sets)
addition (using disjoint union of sets)
multiplication
primes: 1 was interesting, the function returning the cardinality of set of divisors was
interesting, etc.
Glodbach’s conjecture: “all even numbers are the sum of two prime numbers”; (note that AM
did not prove it, just discovered that it was interesting)
Why was AM so successful?
Connection between LISP and mathematics (mutations of small bits of LISP code are likely
to be interesting)
Doesn’t extend to other domains
Lessons from EURISKO (fleet game)
Foundations of Artificial Intelligence
40
Explanation-Based Learning
Explanation- based learning (EBL) systems try to explain why
each training instance belongs to the target concept.
The resulting “proof” is then generalized and saved.
If a new instance can be explained in the same manner as a previous instance,
then it is also assumed to be a member of the target concept.
Like macro- operators, EBL systems never learn to solve a
problem that they couldn’t solve before (in principle).
However, they can become much more efficient at problem-solving by
reorganizing the search space.
One of the strengths of EBL is that the resulting “explanations”
are typically easy to understand.
One of the weaknesses of EBL is that they rely on a domain
theory to generate the explanations.
Foundations of Artificial Intelligence
41
Case-Based Learning
Case-based reasoning (CBR) systems keep track of previously
seen instances and apply them directly to new ones.
In general, a CBR system simply stores each “case” that it
experiences in a “case base” which represents its memory of
previous episodes.
To reason about a new instance, the system consults its case base
and finds the most similar case that it’s seen before. The old case
is then adapted and applied to the new situation.
CBR is similar to reasoning by analogy. Many people believe
that much of human learning is case- based in nature.
Foundations of Artificial Intelligence
42
Connectionist Algorithms
Connectionist models (also called neural networks) are inspired
by the interconnectivity of the brain.
Connectionist networks typically consist of many nodes that are highly
interconnected. When a node is activated, it sends signals to other nodes so that
they are activated in turn.
Using layers of nodes allows connectionist models to learn fairly
complex functions.
Neural networks are loosely modeled after the biological
processes involved in cognition:
1. Information processing involves many simple elements called neurons.
2. Signals are transmitted between neurons using connecting links.
3. Each link has a weight that controls the strength of its signal.
4. Each neuron applies an activation function to the input that it receives from
other neurons. This function determines its output.
Foundations of Artificial Intelligence
43