Download 1 Optimization 8-Queens Problem Solution by Local Search

Parameter Estimation as Optimization Gradient Descent and Related Methods: Hill-climbing & Optimization Holger Schultheis Nov 4th, 2014 •  Recall from last session: –  Free parameters of models are parameters, whose values can not be fixed a-priori •  Parameter estimation tries to determine “good” values for these parameters –  What is “good” is context dependent •  Finding good values is essentially an optimization problem Optimization •  •  •  •  an AI search technique goal: find optimal solution to a problem path to solution does not matter e.g. in integrated-circuit design, vehicle routing, network optimization, layout problems, … 8-Queens Problem •  place 8 queens on a chessboard such that no queen attacks any other •  example: 8-queens problem … Solution by Local Search •  •  •  •  Objective Function use current state, move to neighboring states no multiple-path search no systematic search advantages: –  constant amount of memory –  reasonable solutions in large/infinite/continuous state spaces •  states are evaluated using an objective function State Space Landscape 1 Hill-Climbing Search •  •  •  •  continually moves in direction of increasing value terminates when no neighbor has higher value no look-ahead beyond immediate neighbors algorithm: In the 8-Queens Example... •  all queens on board, one per column •  successor function returns all possible successor states –  moving one queen to another square in her column –  8 * 7 = 56 successors •  cost function: number of pairs of queens attacking each other •  global minimum: 0 Cost Function & Local Minimum Hill Climbing •  "greedy" local search –  grabs good neighbor without thinking ahead –  often rapid progress towards a solution –  e.g. just 5 steps here: this state: h=17 best successors: h=12 typically random selection of successor local minimum with h=1 every successor has higher cost Hill-Climbing Problems •  local maxima –  gets stuck in sub-optimal solution •  ridges –  sequence of local maxima •  plateaux –  flat local maximum or shoulder –  however... Performance in 8-Queens Problem •  hill climbing gets stuck in 86% when starting from random constellation •  takes 3-4 steps on average to get stuck or to succeed •  good performance for state space of approx. 17 million states •  improvement: using sideways moves on plateaux raises success from 14% to 94% 2 Hill-Climbing: Variants •  stochastic hill-climbing –  choses randomly among potential successors –  sometimes better than steepest ascent Hill-Climbing: Conclusion & Beyond •  hill-climbing depends on the shape of the state-space landscape •  first-choice hill-climbing –  generates successors randomly and picks first –  good for many successors •  random-restart hill-climbing –  restarts from randomly generated initial state when failed –  roughly 7 iterations with 8-queens problem An Example •  planning of 3 new airports in a country •  distances from each city in the country to its nearest airport should be minimal •  State space defined by coordinates of airports •  so far: discrete environments •  however: most real-world environments are continuous –  infinitely many states in state space An Example, cnt'd •  moving in state space = moving airports on the map •  objective function f(x1, y1, x2, y2, x3, y3) –  easy to compute for particular state –  hard to describe in general –  (x1, y1), (x2, y2), (x3, y3) –  6 variables –  6-dimensional space •  this is a mathematical optimization problem! –  6-dimensional vector of variables: x Mathematical Optimization Objective Function, 2D Examples •  formulation and solution of a constrained optimization problem: 3 Contour Representations with constraints Optimization Techniques •  how to deal with complex higherdimensional objective functions? •  solution: use of gradient of the landscape of state spaces •  what does that mean? •  compare to 1st derivative in the 1D case Gradient Vector of f(x) •  For a function f(x) there is at any point x a vector of first order partial derivatives (gradient vector): 1st partial derivative w.r.t. x1 Gradient Vector of f(x) •  The gradient of the objective function is a vector that gives the magnitude and direction of the steepest slope •  ... or visualized: 1st derivative operator objective function Gradient Vector, depicted •  The gradient vector g(x) is perpendicular to the contours and in the direction of maximum increase of f(x). Unconstrained Minimization •  Considering the unconstrained problem •  Questions: –  What are the conditions for a minimum? –  Is the minimum unique? –  Are there any relative minima? •  different types of minima... 4 Types of minima, single variable Types of minima, two variables General Method Newton Method Illustrated •  for computing local minima: solve systems of equations (necessary condition) •  solution by Newton's method (also called Newton-Raphson method) •  Iterative computation of successively better approximations of roots (zeroes) of a function 28 Optimization by Newton •  We are looking for the roots of the gradient •  => We need the second derivative Hessian Matrix of f(x) •  After German mathematician Ludwig Otto Hesse (1811-74) •  Square matrix of 2nd order partial derivatives of a function •  This is given as the Hessian Matrix. 5 Solution by Newton's Method Problems with Newton's Method •  The method is not always convergent, even if x0 is close to x* •  The method requires the computation of the Hessian matrix at each iteration •  for i=0,1,2,... •  Hopefully •  Thus, in practice the simple basic Newton method is not recommended... Better: Line Search Descent Algorithm •  Line search descent methods •  Use initial estimate x0 to the optimum point •  Generate sequences of better estimates by successively searching directly in a direction of descent •  Terminate if no further progress or if the necessary condition is sufficiently accurately satisfied directional derivative of f(xi) in the direction ui+1 Descent Condition & Illustration •  Descent Condition: (ui+1 denotes a descent direction, i.e. the directional derivative is negative) •  Sequence of line search descent directions and steps: Even Better: •  Method of steepest descent: line search in the direction of steepest descent •  Steepest descent direction: •  Successive steepest descent directions are orthogonal: 6 Convergence Criteria •  Stop, if one or a combination of the following criteria is fulfilled: Empirical Gradient Descent •  What if the (partial) derivatives of the objective function are unknown? •  Empirical gradient: Evaluating the change in the objective function for small changes in each coordinate •  Empirical Gradient Descent: hill climbing in a disvretized version of the state space. To Conclude... •  optimization in AI: ad hoc hill-climbing & improvements •  objective function & contour representations •  in general: mathematical optimization •  gradient descent by line search •  there is more... 7

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1 Optimization 8-Queens Problem Solution by Local Search