Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Algorithms of Artificial Intelligence Lecture 4: Search E. Tyugu © Enn Tyugu 1 Backtracking on and-or-tree • Let us have the following specific functions describing a search tree: succ(p) - set of immediate descendants of the node p terminal(p) - true iff p is a terminal node. • We assume that the root of a search tree is an andnode. • The following algorithm produces a subtree of an and-or tree given by its root p. This subtree is empty, if the search is not succesful. It starts with empty state given as the second parameter . © Enn Tyugu 2 Backtracking on and-or-tree A.2.17: ANDOR(p,state): if terminal(p) and good(p) then state := p; success(state) elif terminal(p) and not good(p) then failure() else for x succ(p) do L: for y succ(x) do if empty(ANDOR(y,state)) then continue else state:=(state,y(ANDOR(y,sate))); break L fi od; failure(); od; success(state) fi © Enn Tyugu 3 Dependency-directed backtracking 0 A contradictions: B 1 8 C D C 5 2 E F 3 4 A&C B&E E 6 D 9 12 F E F E F 7 10 11 13 14 © Enn Tyugu 4 Branch-and-bound search select(M) -- selects a set to be split from the set of sets M split(A) -- splits the set A into a number of smaller sets prune(M) -- finds the dominated sets in M and prunes them out card(M) -- gives maximum of cardinalities of sets in M bestin(M) -- finds the best element in M. A.2.18: M:={X}; while card(M) > 1 do A:=select(M); M:=M split(A); M:=prune(M) od; bestin(M); © Enn Tyugu 5 Branch-and-bound search continued A.2.18-a: lb(X) – lower bound of values of elements in X ub(X) – upper bound of values of elements in X prune(M): T := selectFrom(M): for X M do if lb(X) > ub(T) then T := X fi od; for X M do if lb(T) > ub(X) then M := M\{X} fi od © Enn Tyugu 6 Stochastic branch-and-bound search It may be difficult to estimate the maximal and minimal values of a goal function f on a set. One can use the following idea: instead of a deterministic search perform a stochastic search with a distribution of probability of testing that is higher for the areas of space where the found values of f are higher. This gives us a good optimization algorithm for nonlinear multimodal optimization problems (see Strongin). © Enn Tyugu 7 Stochastic branch-and-bound search distr - probability distribution for selecting a point in the seach space random(distr) - generator of random points with distribution distr in the search space modify(distr,x,f(x))- procedure of adjustment of the distribution that increases the probability of selecting a point with higher value of f. A.2.19: distr:= even; x:= random(distr); while not good(x) do x:= random(distr); distr:= modify(distr,x,f(x)) od © Enn Tyugu 8 Binary search as branch-and-bound The functions left(p), right(p) and val(p) are for selecting the left or right subtree and taking the value of the node p (of the root of the tree). A.2.19: binsearch(x,p) = if empty(p) then failure elif val(p) = x then success elif val(p) < x then binsearch(x,right(p)) else binsearch(x,left(p)) fi © Enn Tyugu 9 Alpha-beta pruning This algorithm is for heuristic search of best moves on game trees of games of two players, assuming that both players apply the best tactics, i.e. they do as good moves as possible on the basis of the available knowledge. It is a concretisation of the branch and bound search. This algorithm gives the same result as the minimax procedure for the games of two players. This algorithm lies in the ground of many game programs for chess and other games. © Enn Tyugu 10 Example maximize 1 2 minimize 11 12 22 21 maximize 4 1 8 5 1 2 7 5 111 . . . © Enn Tyugu 11 Example of alpha-beta pruning maximize 4 1 2 2 minimize 4 11 8 12 7 22 2 21 maximize 4 1 8 5 1 2 7 5 111 112 121 122 211 212 221 222 © Enn Tyugu 12 Alpha-beta pruning alpha - minimal value of the end result that can be obtained for certain by the maximizing player; beta - maximal value of the end result that can must be given away for certain by the minimizing player. A.2.20: alphabeta(p, alpha,beta) = if empty(succ(p)) then return(f(p)) else m:=alpha; for x succ(p) do t:=-alphabeta(x,-beta,-m); if t>m then m:=t fi; od if m > beta then return(m) else return(beta) fi fi © Enn Tyugu 13 A* algorithm The problem it solves is finding a shortest path between two nodes of a graph, where besides the lengths of arcs of the graph also estimates of the lower bound of the distance to the end node are given for every node of the graph. We shall use the following notations: s -- given initial node of the path g -- given end node of the path p(n,i) -- length of the arc from the node n to the node i l(n) -- length of the shortest path from the initial node s to the node n h(n) -- length of the shortest path from the node n to the end node g H(n) -- estimate of the lower bound of h(n) Open -- set of nodes to be checked succ(n) -- successor nodes of n. © Enn Tyugu 14 A* algorithm continued A.2.21: l(s) := 0; OPEN := {s}; while not empty(Open) do x := select (Open); n:=x; for i Open\ {x} do if l(i) + H(i) < l(n) + H(n) then n := i fi od; if n = g then success() else Open := Open succ (n) \ {n}; for i succ (n) do l(i) := l(n) + p(n,i) od; fi od; failure() © Enn Tyugu 15 A* continued The A* algorithm has some interesting properties: 1. Under the conjecture that H(n) < h(n) for each node n of the graph, this algorithm gives precise answer -- it finds an actual shortest path from s to g. 2. This is the fastest algorithm which can be constructed for solving the given problem precisely. © Enn Tyugu 16 Unification A substitution is a tuple of variable - expression pairs, for example: ((x,A),(y,F(B)),(z,W)). We shall additionally require that no variable of a substitution can appear in an expression of the same substitution. A substitution describes a transformation of expressions where variables of an expression will be replaced by their corresponding expressions. Application of a substitution s to an expression E is denoted by E°s. Example: P(x,x,y,v)°((x,A),(y,F(B)),(z,W)) = P(A,A,F(B),v). © Enn Tyugu 17 Unification continued Substitutions s and t are independent, if their replaceable variables are different. In this case we can build a composition s*t of the substitutions s and t which is a substitution consisting of the pairs of both s and t . If E,...,F are expressions, s is a substitution, and E°s = ... = F°s then s is a unifier of E,..., F. The most general unifier of E,...,F denoted by mgu(E,...,F) is the unifier s of E,...,F with the property that for any unifier t of E,...,F there exists a unifier r, such that E°s*r=E°t, ... , F°s*r=F°t. This unifier s is in a sense the least unifier of E,...,F. If there exists mgu(E,...,F) for expressions E,...,F, then it is unique. This follows immediately from the definition of mgu. © Enn Tyugu 18 Examples Unification, i.e. finding of mgu, can be directly used for answering queries to knowledge bases. For example, if we have a knowledge base about people which contains the facts: married(Peter, Ann) married (Jan, Linda) one can ask a question about Lindas marriage by asking to unify the formula married(x,Linda) with the facts of the knowledge base. The answer ((x,Jan)) will show that Linda is married with Jan. © Enn Tyugu 19 Unification algorithm Let us have the functions: var(x) -- true, if x is a variable const(x) -- true, if x is a constant MguVar(x,y) -- gives mgu of the variable x and expression y (not a variable), if such exists and fails if x occurs in y. includes(x,y) -- true, if y includes x. Then the unification algorithm is as follows. © Enn Tyugu 20 Unification algorithm A.2.22: Mgu(x,y)= if x=y then succes( ) elif var(x) then succes(MguVar(x,y)) elif var(y) then succes(MguVar(y,x)) elif (const(x) sconst(y)) & x y length(x) length(y) then failure() else g := ( ); for i := to length(x) do s := Mgu (part(x,i), part(y,i)); if s= failure then failure() fi; g := compose (g,s); x := subst(x,g); y := subst(y,g) od; success(g) fi MguVar(x,y) = if includes(x,y) then failure() else ((x/y)) fi © Enn Tyugu 21 Dictionary search Find a given word from a dictionary where the words are ordered in the lexicographic order. The frequency of occurrence of the i-th letter of the alphabet is pi. Total frequency of occurrence of letters preceding the i-th letter of the alphabet is i-1 qi = pj j=1 If a dictionary contains l pages then an approximate value of the page number h(w) where the word w consisting of letters of the alphabet with numbers i1,i2,...,in can be computed from the frequencies of these letters as follows: h(w) = 1+ l(qi1 + pi1(qi2 + ... + pin qin) ...). © Enn Tyugu 22 Dictionary search • Let any word w be selected on a given page k and expected page number h(w) be computed according to the formula given above. This page number is the value of a new function denoted by f(k). In order to find the page number j for the word v, we have to solve the equation f(j) = h(v) with respect to j (h(v) is constant for the word v). • We use the following variables: – j - currently expected page number of the word v (j is equal to h(v) in the beginning of search) – a - number of the first page to search – b - number of the last page to search • The algorithm computes recursively a better approximation to j, until the right value is found. In this case, good(j) means that the given word is on the page j. © Enn Tyugu 23 Dictionary search A.2.23:dict (j, a, b, v)= = if good(j) then j elif j=a then dict (j+1, a+1, b,v) elif j=b then dict (j -1 , a, b-1,v) else s= f(j); if s > h(v) then dict (j –(j-a)*(s-h(v ))/(s-f(a)), a, j, v) else dict (j+(b-j)*(h(v)-s)/(f(b)-s) , j, b,v) fi fi © Enn Tyugu 24 Viterbi algorithm This is an algorithm of so called compound decision theory that uses the dynamic programming paradigm. It is a word recognition algorithm, specifically for finding the most likely word for a sequence of patterns x1, …, xm which represent symbols (letters). The goal is to choose the assignment w1, …, wm, of letters from a given alphabet L which has the maximum probability over all possible rm assignments (r is the number of letters of the alphabet L). Assuming the first-order Markov property of the sequence of patterns, i.e. that probability of appearance of a pattern in the sequence depends only on the preceeding pattern, we get an algorithm with time complexity O(m*r2) instead of O(rm). An improved version of the algorithm is called DVA (Dictionary Viterbi Algorithm) and it takes into the account also a dictionary - only words from the dictionary are accepted. © Enn Tyugu 25 Viterbi algorithm p1(x,w) - probability of the letter w with the pattern x being the first letter in a word p(x,w,u) - probability of the letter w with the pattern x following the letter u in a word pw – probability of the prefix ending with the letter w prefw – best prefix ending with the letter w selectFrom(L) - produces an element of L A2.24: for w1 L do pw1 := p1(x1,w); prefw1 := (w1) od; for i=2 to m do for wi L do zz:= selectFrom(L); p := pzz*p(xi ,wi,zz); for z L do if pz*p(xi ,wi,z) > p then zz:= z; p := pz*p(xi ,wi,z) fi od; pwi:=p; prefwi:=(prefzz,wi); od od; © Enn Tyugu 26 Dictionary Viterbi Algorithm inDictionary(z) = true, iff there is a word beginning with z in dictionary selectFrom(pref,L) gives a letter that suits for the prefix pref, so that the prefix is always in dictionary ). A2.25: for w1 L do pw1 := p1(x1,w); prefw1 := (w1) od; for i=2 to m do for wi L do zz:= selectFrom(L); p := pzz*p(xi ,wi,zz); for z L do if pz*p(xi ,wi,z) > p and inDictionary(prefzz,wi) then zz:= z; p := pz*p(xi ,wi,z) fi od; pwi:=p; prefwi:=(prefzz,wi); od od; © Enn Tyugu 27 Hierarchy of search methods search Viterbi search breadth-first backtracking depth-first lru dependencydirected best first rlu A* unification © Enn Tyugu search-space modifications dictionary search beam search branch and bound binary search stochastic branch-and-bound 28