Download Document

Document related concepts

Machine learning wikipedia , lookup

Computer Go wikipedia , lookup

Time series wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Neural modeling fields wikipedia , lookup

Gene expression programming wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Pattern recognition wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
IAI REVISION 2012
Turing Test
Turing (1950) “Computing machinery
and intelligence":
Can machines think? Can machines
behave intelligently?
Predicted that by 2000, a machine
might have a 30% chance of fooling
a lay person for 5 minutes
Suggested major components of AI:
knowledge, reasoning, language
understanding, learning
Problems: Turing test is not
reproducible, constructive, or
amenable to mathematical analysis
AI is not trying to copy humans
• “artificial flight” was successful because the
Wright brothers stopped mimicking birds.
• We don’t want to copy pigeons.
• Where else is the idea of a “gliding wing” and
a propeller used in nature?
Laws of Thought
“Socrates is a man; all men are mortal; therefore
Socrates is mortal.” LOGIC
In 1965 computer programs existed that could in
principle solve any solvable problem described
in logical notation (however if no solution
exists, the program would not terminate).
How to we formally state real-world problems.
Some problems take too long to solve exactly.
Economics
• How do we make decisions to maximize payoff
(utility, money, happiness).
• How do we do this when others cooperate or
do not cooperate (criminals).
• What about if the reward is not immediate,
but maybe delayed far into the future.
• Decision theory/game theory/operations
research.
History of AI
McCulloch and Pitts (1943) on/off
perceptron.
Hebb (1949) Hebbian learning rule.
Turing (1950) “Computing Machinery
and Intelligence”
Newell and Simon (1976) physical
symbol system hypothesis
Samuel (1952) checkers player; the
program leaned to play better than
its creator I CAN TELL YOU HOW IN
THE DISCUSSION THIS AFTERNOON
Game playing
• IBM’s Deep Blue defeated the world
champion Garry Kasparov.
• “a new kind of intelligence”
• IBM’s stock increased by $18 billion USD.
• By studying this, chess players could
draw!!!
• Recently the computer is much better.
• But what about “GO”, or other games?
Vacuum-Cleaner
•
•
•
•
•
Vacuum agent perceives
which square (A or B)
and if clean or dirty.
It has actions; move left/right, suck, do nothing.
One simple function; if current square dirty, then
suck, else move to other square.
• We can write perceived state and action pairs
• [A, Clean] right (if in A && clean, then move right)
• [A, Dirty] suck (if in A && dirty, then suck)
States and Actions
• A state is a description of the world now.
• An action is a transition from one state to
another.
• Not exactly the same but in java – instance
variables are like state e.g. person = {name,
age}
• An action (java Method) changes the state
with get/set methods.
Formulation of Problem Domain
• State: [l, c, d] robot=left, left room is clean,
right room is dirty. Or in Binary [0,1,0]
• Initial state:[l, d, d]
• Action: move Left/right, suck.
• Transition diagram: next slide
• Goal states: {[l, c, c], [r, c, c]}
• Path cost: number of actions (maybe sucking
takes twice as much energy as moving??)
State transition diagram for vacuum
cleaner world.
Note – some actions
Are reversible and some
Are not - which?
2.3.2 Properties of task environment
• Fully observable vs. partially observable.
• Single agent vs. multiple agent. (competitive
vs. cooperative)
• Deterministic vs. stochastic.
• Episodic vs. sequential.
• Discrete vs. continuous.
• Known vs. unknown.
2.4.2 simple reflex
agents
IF condition THEH action.
Human reflexes e.g.
blinking, knee jerk.
A fly avoid getting
squatted by a human
Other examples.
2.4.3 model-based reflex agents
Example; we gasp for breath, even under water.
A fly will move if we try to swat it. BLUSHING.
2.4.4 goal-based
• There is a desirable
goal state of the
world.
• goal eg crossing a
road.
• Children and orange
juice in tall/short glass
2.4.6 general
learning agent.
Instead of writing the
components ourselves, why not
let the machine
learn?
Turing (1950).
http://www.youtube
.com/watch?v=lrY
Pm6DD44M&feat
ure=relmfu
http://www.youtube
.com/watch?v=BG
PGknpq3e0
Search
UNIFORMED SEARCH
• Depth first and breath first search.
• Uniform cost search (expands the cheapest
cost – how far travelled so far). h(n)
INFORMED SEARCH
• Greedy (expand first closest to goal according
to some information). Funciton g(n)
• A* (A star) f(n)= h(n)+g(n)
Differences in Search methods
• All of them work the same way!!!
• The only difference is the order in which they sort
the list
• Depth first – FIFO, breath first FILO
• Greedy g(n), uniform f(n), A* f(n)+g(n)
• Uniformed search – looks in all directions (no
knowledge of where the goal is)
• Informed search – is directed toward the goal by
information e.g. straight line distance to goal city.
Map of Romania
Breath First Search
Depth First Search
Greedy Search
A*
search
• Expand
cheapest
according to
distance
travelled so far +
expected
distance to
travel.
Resolution - summary
If we know (knowledge base)
• A or B
• Not B
Then we can conclude (the knowledge base resolves to)
• A
The propositions must be in CNF (conjunctive normal
form).
We add the negation of what we want to prove.
If we get a contradiction (false), then the
theorem/proposition is true. (this is called proof by
contraditction. )
Resolution Algorithm
Small example
Is it sunny?
sunny = TRUE?
Knowledge base:
sunny
daytime
sunny V night
Prove sunny
Resolution Algorithm
Small example
Is it sunny?
sunny = TRUE?
Prove sunny
Knowledge base:
Negate it
sunny
Add it to the knowledge base
daytime
CONTRADICTION
sunny V night
¬sunny
¬sunny = FALSE
Therefore: sunny = TRUE
Procedure for converting to CNF
• (a) To eliminate ↔,
– (a ↔ b) ≡ (a → b) Λ (b→ a)
• (b) To eliminate →,
– (a → b) ≡ ¬ a ν b
• (c) Double negation ¬ (¬a) ≡ a
• (d) De Morgan
– ¬ (a Λ b) ≡ (¬a ν ¬b) ¬(a ν b) ≡ (¬a Λ ¬b)
• (e) Distributivity of Λ over ν
– (a Λ (b ν c )) ≡ ((a Λ b) ν (a Λ c))
• (f) Distributivity of ν over Λ
– (a ν (b Λ c )) ≡ ((a ν b) Λ (a ν c))
Two player games
MinMax
Alpha beta pruning
• Pruning – means cutting off redundant parts
• Typically we “prune a tree”
• MinMax considers all possibelities, however,
using
α-β pruning example
α-β pruning example
α-β pruning example
α-β pruning example
α-β pruning example
Learning By Example
• Perceptrons (single layer) – linearly seperable
data.
• Artificial Neural Networks (multilayer
perceptrons – usually 2 or 3)
• Support Vector Machines – linearly separable
data.
• Project/transform into higher dimensional
space e.g. 2D to 3D and re-represent – then
apply a Support Vector Machine.
The First Neural Networks
It consisted of:
A
A
A
A
set of inputs - (dendrites)
set of weights – (synapses)
processing element - (neuron)
single output - (axon)
G51IAI – Introduction to AI
McCulloch and Pitts Networks
X1
X2
2
2
Y
-1
X3
G51IAI – Introduction to AI
The activation of a neuron is binary. That is,
the neuron either fires (activation of one) or
does not fire (activation of zero).
McCulloch and Pitts Networks
X1
X2
2
2
Y
-1
X3
G51IAI – Introduction to AI
θ = threshold
Output function:
If (input sum < Threshold)
output 0
Else
output 1
McCulloch and Pitts Networks
X1
X2
2
2
Y
-1
X3
G51IAI – Introduction to AI
Each neuron has a fixed threshold. If the
net input into the neuron is greater than
or equal to the threshold, the neuron
fires
McCulloch and Pitts Networks
X1
X2
2
2
Y
-1
X3
G51IAI – Introduction to AI
Neurons in a McCulloch-Pitts network
are connected by directed, weighted
paths
McCulloch and Pitts Networks
X1
X2
2
2
Y
-1
X3
If the weight on a path is positive the
path is excitatory, otherwise it is
inhibitory
x1 and x2 encourage the neuron to fire
x3 prevents the neuron from firing
G51IAI – Introduction to AI
McCulloch and Pitts Networks
X1
X2
2
2
Y
-1
X3
G51IAI – Introduction to AI
The threshold is set such that
any non-zero inhibitory input
will prevent the neuron from
firing
(This is only a rule for
McCulloch-Pitts Networks!!)
McCulloch and Pitts Networks
X1
X2
2
2
Y
-1
X3
G51IAI – Introduction to AI
It takes one time step for a signal to
pass over one connection.
Worked Examples
on Handout 1
Does this neuron fire?
Does it output a 0 or a 1?
Inputs
1
2
2
3.5
0
0.5
1.5
1
G51IAI – Introduction to AI
0
?
1.5
Threshold(θ) = 4
3.5 < 4
So neuron outputs 0
Threshold Function:
If input sum < Threshold
return 0
Else
return 1
1) Multiply the
inputs to the
neuron by the
weights on
their paths
2) Add the inputs
3) Apply the
threshold
function
Answers
• Using McCulloch-Pitts model we can model some
logic functions
• In the exercise, you have been working on logic
functions
•AND
•OR
•NOT AND
G51IAI – Introduction to AI
Answers
AND Function
X
1
Threshold(θ) = 2
Z
1
Y
G51IAI – Introduction to AI
Threshold Function:
If input sum < Threshold
return 0
Else
return 1
X
Y
Z
1
1
1
1
0
0
0
1
0
0
0
0
Answers
OR Function
X
2
Threshold(θ) = 2
Z
2
Y
G51IAI – Introduction to AI
Threshold Function:
If input sum < Threshold
return 0
Else
return 1
X
Y
Z
1
1
1
1
0
1
0
1
1
0
0
0
Answers (This one is not a McCullochPitts Network)
NOT AND (NAND) Function
X
-1
Threshold(θ) = -1
Z
-1
Y
G51IAI – Introduction to AI
Threshold Function:
If input sum < Threshold
return 0
Else
return 1
X
Y
Z
1
1
0
1
0
1
0
1
1
0
0
1
One additional example
AND NOT Function
X
2
Threshold(θ) = 2
Z
-1
Y
G51IAI – Introduction to AI
Threshold Function:
If input sum < Threshold
return 0
Else
return 1
X
Y
Z
1
1
0
1
0
1
0
1
0
0
0
0
Multi-Layer
Neural
Networks
G51IAI – Introduction to AI
Modelling Logic Functions
XOR
2
2
X1
-1
Y1
Z
X2
-1
Y2
X1
X2
Z
1
1
0
1
0
1
0
1
1
0
0
0
2
2
XOR
Function
X1 XOR X2 = (X1 AND NOT X2) OR (X2 AND NOT X1)
G51IAI – Introduction to AI
Modelling Logic Functions
X1
-1
Z
Y2
X2
2
2
AND
NOT
X1
X2
Y2
1
1
0
1
0
0
0
1
1
0
0
0
X1 XOR X2 = (X1 AND NOT X2) OR (X2 AND NOT X1)
G51IAI – Introduction to AI
Modelling Logic Functions
2
2
X1
Y1
Z
X2
-1
AND
NOT
X1
X2
Y1
1
1
0
1
0
1
0
1
0
0
0
0
X1 XOR X2 = (X1 AND NOT X2) OR (X2 AND NOT X1)
G51IAI – Introduction to AI
Modelling Logic Functions
XOR
2
Y1
Z
Y2
2
OR
X1
X2
Z
1
1
0
1
0
1
0
1
1
0
0
0
Y1
Y2
Z
1
1
1
1
0
1
0
1
1
0
0
0
X1 XOR X2 = (X1 AND NOT X2) OR (X2 AND NOT X1)
G51IAI – Introduction to AI
Modelling Logic Functions
X1
X2
Y1
Y2
Z
1
1
0
0
0
1
0
1
0
1
0
1
0
1
1
0
0
0
0
0
X1 XOR X2 = (X1 AND NOT X2) OR (X2 AND NOT X1)
G51IAI – Introduction to AI
Key Idea!
• Perceptrons cannot learn (cannot
even represent) the XOR function
• Multi-Layer Networks can, as we have
just shown
G51IAI – Introduction to AI
First training step
• We wanted 1
• We got 0
• Error = 1 – 0 = 1
X
Y
Z
1
1
1
1
0
0
0
1
0
0
0
0
While epoch produces an error
Present network with next inputs (pattern) from epoch
Err = T – O
If there IS an error,
If Err <> 0 then
then we change ALL
Wj = Wj + LR * Ij * Err
End If
the weights in the
End While
network
Finding the weights.
• The weights w0 and
w1 have a smooth
(continuously and
differentiable) error
surface. w.x
• The best value is
unique.
• We can gradually
move toward the
global optima.
• LOSS= error
Small learning rate
Large learning rate
Transform from 2D to 3D
A non-linear decision boundary
A linear decision boundary
A 2D (x1, x2) coordinate maps to a 3D coordinate (f1, f2, f3)
CLASS EXERSISE – DO THE FOLLOWING EXAMPLES – NEXT SLIDE
f1 = x1*x1
f2 = x2*x2 f3 = 1.41*x1*x2
(0,0) -> (?,?,?)
(0,1) -> (?,?,?)
(-1, -1) -> (?,?,?)
1, 2 or 3 layer Neural Networks
• One layer (a
perceptron) defines a
linear surface.
• Two layers can define
convex hulls (and
therefore any
Boolean function)
• Just Three layers can
define any function!!
In the general for the 2- and 3- layers
cases, there is no simple way
to determine the weights.
Terminology of Support Vector
Machines
o
o
o
o
o
o
x
x
o
o
o
o
o
x
x
x x x
x
x
x
x
x
Support vectors
Maximum
Margin
Separator
margin
disadvantages of ANN
• We can look at someone else’s java program
and try and understand it (it may not have
comments and correct indentation – but we
should understand it a little).
• An ANN is a jumble of numbers and is difficult
to understand. Sometimes humans do not
have confidence in them because they are
difficult to explain.
Advantages of SVM
A perceptron depends on the initial weights and
the learning rate.
A perceptron may give a different answer each
time – a SVM gives a unique and best answer.
A perceptron can oscillate when training – and
will not converge if the data is not linearly
separable. A SVM will find the best solution it
can – given the data it has.
Evolution and Genetic Algorithms
• Evolution occurs in 3 part
• Selection, inheritance, mutation.
• We could artificially select e.g. tall
people – and over time – people would
probably get taller. If we do not select
in one direction – there would be no
reason to change. Usually caused by
the environment or man.
• Inheritance means you look like your
parents.
• Mutation – introduces new genetic
material into the gene pool.
Examples of “Problem Solving” ability
of Evolution
• How can we eat meat but we
are not digesting ourselves?
• How do ants find their way
back to a nest – or birds
migrate over vast distances?
• Bear + other animals
hibernate in the winter to
save energy.
• Symbiotic relationships…
GA and state space
• The state space is the space of bit-strings.
• FORMUATE THE PROBLEM
• State: 0110 in knapsack – include 2 and 3 items
and not items 1 and 4
• Initial state: any random starting point 1101
• Action: generate new bit string (select-mutate)
• Transition diagram: next slide – one bit mutation
• Goal states: the ones with the best value.
• Path cost: number of actions (maybe sucking
takes twice as much energy as moving??)
GA as search
• We can enumerate bit strings in different ways!
• A GA can be thought of as a search process –
however unlike un/informed search methods – it
is stochastic so does not give the same answer
each time.
• With un/informed search we ORDER or SORT the
list by g(n) + h(n).
• With GA we let evolution provide the ordering for
us.
Genetic Algorithms and Artificial
Neural Networks.
• We can code a ANN as a
bit-string.
• We have a population of
ANN.
• (1.2, 4.1, -2.5, ….)
• Is a list describing the
weights in a network.
• Each number is changed a
little bit – so the network
behaves like its parents did
(i.e. not totally different).
• “Like father like son”
• (Boys …look at your
girlfriends mother)
(1.2, 4.1, -2.5, ….)
A linear list of numbers
Represents the weights
In an neural network.