Download 10-18 - UCSB Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Artificial Intelligence
CS 165A
Thursday, October 18, 2007
 Informed (heuristic) search methods (Ch 4)
 Adversarial search – games (Ch 6)
1
Notes
• Reading list typo
– Skip Chapter 5, read Chapter 6
• HW#1, problem 4: Tic-Tac-Toe
– “It’s not whether you win or lose, it’s how you play the game!”
– Implement breadth-first and depth-first approaches looking for
your goal (three in a row for you)
– If you want to also avoid losing (three in a row for your opponent),
you may, but this is optional
– In depth-first search, you should stop searching when you find a
goal state (i.e., don’t keep looking for the “best” goal)
2
Review
Informed Search Methods
• Best-first search
– Greedy best-first search
– A* search
• Memory bounded search
Not covering in detail
– IDA*
– SMA*
• Iterated improvement algorithms
– Hill-climbing
– Simulated annealing
As with blind search, the strategy is defined by choosing the
order of node expansion
3
Review
A* Search
• “A* Search” is A Search with an admissible h
– h is optimistic – it never overestimates the cost to the goal
 h(n)  true cost to reach the goal
 h is a “Pollyanna”
– So f (n) never overestimates the actual cost of the best solution
passing through node n
function A*-SEARCH(problem, h) returns a solution or failure
return BEST-FIRST-SEARCH(problem, f )
4
Thursday Quiz
1. How does an online search agent operate?
2. How does it differ from an offline search agent?
See Sec. 4.5
Simulated Annealing
• Similar to hill-climbing
– But includes a random element
– Sometimes take “bad” steps to escape local maxima
• Motivated by the roughly analogous physical process of
annealing
– Heating and then slowly cooling a substance to obtain a strong
crystalline structure
– T is temperature, E is energy
 A schedule determines the rate at which T is lowered
 Want to maximize the energy E
– E (state) measures the state’s “goodness”
– E measures increase in “goodness” resulting from new state
6
Simulated Annealing (cont.)
• For the current state, evaluate the operators/actions
• Instead of taking the best action, choose an action at
random
– If that action results in a higher “goodness” value (E > 0), take it
– Otherwise, take it with probability between 0 and 1
E = 1
E = -1
Etc…
E = 3
E = -2
7
Probabilistic action
• E : Energy(action) – Energy(current)
– If E > 0, take the action
– If E < 0, maybe take it, maybe not (probabilistic action)
• P(action) = eE/T
(for negative E)
– T : Determined by “schedule”
 Starts large, decreases towards zero as the algorithm iterates
 Large T means ???
 Small T means ???
– What happens when T = 0 ?
8
Probabilistic action
• What does it mean to take an action with probability
P(A) = eE/T ?
• For action A1
– Let E = -3 and T = 20
– P(A1) = e-3/20 = 0.71
• Choose a random number
r  [0..1]
– If r < 0.71, take action A1
– Else do nothing
• For action A2
– E = -9 and T = 20
– P(A2) = e-9/20 = 0.35
• Choose a random number
r  [0..1]
– If r < 0.35, take action A2
– Else do nothing
9
P(action)
P(action)
1
Decreasing T
eE/T
eE/T
0
E
“Badness” of action
10
Simulated Annealing (cont.)
11
Intuition of SA
• Consider a ball bouncing around a vibrating surface, which
you are holding with your eyes closed
– Goal: Get the ball to the deepest point of the surface
– What’s your strategy?
– Use vibration your to jiggle ball over surface
 Use lots of vibration at beginning
– Get big jumps as if had smoother surface
 Use less vibration later
– Getter smaller jumps when feel smaller surface bumps
– Analogy to annealing
 T measures amount of vibration
 E measures depth of the ball (goodness of solution)
12
Example: Get the ball to the minimum
13
Large T
14
Smaller T
15
Even smaller T
16
T=0
17
Simulated Annealing (cont.)
• The trick is coming up with a proper schedule
– Where to start (what T)?
– How quickly to lower T?
 Too quickly?
 Too slowly?
• SA is an example of a Monte Carlo method
– A technique that uses probabilistic-based calculations to find an
approximate solution to a complex (perhaps deterministic) problem
18
Effective Branching Factor
• Though informed search methods may have poor worstcase performance, they often do quite well if the heuristic
is good
– Even if there is a huge branching factor
• One way to quantify the effectiveness of the heuristic: the
effective branching factor, b*
– N: total number of nodes expanded
– d: solution depth
– N = 1 + b* + (b*)2 + … + (b*)d
• For a good heuristic, b* is close to 1
19
Example: 8-puzzle problem
Averaged over 100 trials each at different solution lengths
Ave. # of nodes expanded
Solution length
20
Local beam search
• In simulated annealing, how many nodes do you keep in
memory?
– One
• Local beam search keeps track of k states rather than just one
– Start with population of k (initially random?) candidates
– Choose k best descendents
 Not same as choosing and following k independent solutions
– Why?
 Analogy to beam of flashlight (vs. single ray of light)
• Stochastic forms of beam search
– Like genetic algorithms, where successor states are generated by
combining two parent states
I said that iterative improvement
algorithms are like this:
Hill-climbing and SA are really
more like this:
Local beam search is really more
like this:
Review: Problem solving and search
• AI is about solving hard problems, often requiring
searching through a problem space
• Problem definition
– The choice of problem space is very important
 Abstraction
– The choice of data structure is very important
 Can make tasks much easier or harder
• Keys: Search graph and list of nodes to be expanded
• Toy problems (M&C, etc.) are useful to show and compare
algorithms
– Real problems are not so simple!
23
Review: Problem solving and search
• Choose a search algorithm depending on the problem
requirements
• There are various ways to evaluate search algorithms
–
–
–
–
Completeness
Optimality
Goodness of solution
Cost of finding solution (computation and memory requirements)
• Uninformed search is exponential
– We can do better, on average, if we know something about the
problem (hence, informed search using heuristics)
– The real difference: Effective branching factor is usually smaller
for informed search methods
24
But alas, things are not so simple…
• All this is fairly straightforward for M&C, N-queens, planning travel
to Bucharest, etc.
– What about dynamic, uncertain environments; stochastic inputs, etc.?
• What if we also know that
–
–
–
–
–
–
–
Missionaries try to convert cannibals, and sometimes succeed
Converted cannibals don’t eat missionaries
Unconverted cannibals and converted cannibals argue a lot
Even unconverted cannibals would rather eat fish than eat missionaries
Missionaries can catch fish with their fancy traps
Missionaries won’t fish on Sundays and have no spare fish on Fridays
Some cannibals (converted or unconverted) steal the missionaries’ traps
• How would we solve M&C now?
• What if someone else chooses every other action?
25
Up next…
• This leads us to representing and reasoning about more
complex information
• But first…
– A detour into Game Playing
– or, How to do search when someone is trying to make life
miserable for you!
26
Game Playing – Chapter 6
• Why game playing?
– Interesting, engaging, people like competition
– Games seem to require intelligence, so perhaps a good AI testbed
 Games are the drosophilae of AI???
– Nice research domain:
 Simplified problems, easy to represent states and actions (usually)
 Yet maintains significant elements of real-world problems:
– Competition
– Limited resources (e.g., time)
– Solution strategies
– Uncertainty
 Due to opponent’s unknown moves, limited time to
calculate states, and possible random events (e.g., dice)
– Unlike the real world, it’s possible to be fully informed
 But even that doesn’t make it easy
Game Playing (cont.)
• Game playing is different from straightforward search
– Uncertainty due to opponent
– Much too hard to solve completely (branching factor is large)
– Time limit is very significant
 With limited time, a small change in efficiency makes a BIG
difference
 Because it is unlikely or impossible to calculate exact
consequences of a move (find the goal state), it is necessary to
approximate
Game Playing (cont.)
• Games can be deterministic or non-deterministic
– Contains an element of chance: “probabilistic”, “stochastic”
• …and have perfect information or not
– “Accessible” – The world state is fully accessible to the program
or agent (nothing hidden)
Deterministic
Perfect
information
Imperfect
information
Non-deterministic
Chess, Checkers, Go,
Othello, Tic-tac-toe
Backgammon,
Monopoly
Navigating a maze
Bridge, Poker,
Scrabble
Formulating a game strategy
• Formal definition of game:
–
–
–
–
Initial state
Operators
Terminal test
Utility function
• One move consists of two half-moves, or two ply
– One per player
• Our approach to finding a game strategy
– Consider how to approach the exact solution (perfect play)
– Techniques for choosing a good move when time is limited
 Prune the search to reduce costs
 Approximate by evaluating moves/states (heuristics)
The Minimax algorithm
• Determines the optimal strategy, and thus the best move,
for player MAX (whose opponent is MIN)
– Generate the whole game tree down to terminal states
– Evaluate each terminal state using the utility function
– Back up from the leaf nodes toward the root, one layer at a time,
using the MIN and MAX rules, alternatively
 MAX chooses the maximum value
 MIN chooses the minimum value
– When reaching the root node, MAX’s best move is the child with
the highest value
• The minimax decision maximizes the utility under the
assumption that the opponent seeks to minimize it (and
uses the same evaluation function)
2-ply Minimax example
Which move to choose?
The minimax decision is move A1
Another example
•
In the game, it’s your move. Which move will the
minimax algorithm choose – A, B, C, or D? What is the
minimax value of the root node and nodes A, B, C, and
D?
MAX
4
1
A
1
7
2 B
2
5
2
4 C
8
9
4
3 D
6
3
3
MIN
5
Multiplayer games
• (A, B, C) – utility values for players A, B, and C
Player A could take either move here
Ignores strategies such as alliances…
Minimax search
• Generate the tree of minimax values
– Then choose best (maximum) move
– Don’t need to keep all values around
 Good memory property
• How deep to generate the tree?
– Generating the full tree is best, but not often feasible
– Generate N moves down, and use an evaluation function
 The larger N is, the better the game will perform
• Depth-first search is used to implement minimax
– Recursive implementation (Fig. 6.3)