Download strongly polynomial time algorithm

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mathematical model wikipedia , lookup

Vincent's theorem wikipedia , lookup

Approximations of π wikipedia , lookup

Big O notation wikipedia , lookup

Halting problem wikipedia , lookup

Factorization wikipedia , lookup

System of polynomial equations wikipedia , lookup

Algorithm characterizations wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Transcript
Our three formalisms
• Linear programs
– Algorithm: Simplex algorithm
• Integer linear programs
– Algorithm: Branch-and-bound (based on simplex)
• Network flow
– Network simplex + Klein’s algorithm
• All these algorithms use exponential time on
some instances.
Worst-case efficient algorithms
• Can we get algorithms running in polynomial
time?
• We say that an algorithm runs in polynomial
time if its running time is upper bounded by a
polynomial in the size of the input.
• What exactly do we mean by this?
Polynomials
• 𝑇 𝑛 ≤ 12 𝑛7 − 7𝑛4 + 𝑛
• 𝑇 𝑛 = 𝑂 𝑛𝑐 for some constant 𝑐.
• What is n?
– The size of the input, but what is ”size”?
• What is T(n)?
– The running time, but how do we measure it?
Models of computation
Model 1:
• Our computer holds exact real values.
• The input is given by some real matrices. The
size of the input is the number of entries in
the matrices.
• We perform an arithmetic operation (+,-,*,/)
in each step of computation.
• The output is the exact real result.
Models of computation
Model 2:
• Our computer holds bits (bytes, words).
• The input is given by rational matrices. The size of
the input is the total number of bits in the
matrices (each number being described in binary
notation).
• We perform some logical bit operation in each
step of computation (or word operations, but we
“charge” for each bit operation).
• The output is the exact rational result.
Model 1 vs. Model 2
• Model 1 is closer to the way we usually think about
algorithms operating with real numbers. It is a very useful
abstraction.
• Model 2 is closer to reality. Model 1 cannot be
implemented in a 100% faithful way. Model 2 can (and is).
• An algorithm in Model 1 can be converted to an algorithm
in Model 2 if it does not rely on very large numbers and we
restrict the input to rational numbers.
• The terminology “polynomial time algorithm” standardly
refers to Model 2 (bits and gates).
• The terminology “strongly polynomial time algorithm”
refers to Model 1 (numbers and arithmetic).
State-of-the-art
• Network flow (and totally unimodular LPs in general):
– Strongly polynomial time algorithm:
• Flow fixing (Eva Tardos, 1985)
• Minimum mean cost cycle cancelation algorithm. (Goldberg-Tarjan, 1989)
• Linear programming:
– Polynomial time algorithms:
• The ellipsoid algorithm (Khachian, 1979)
• Interior point algorithms (Karmakar, 1984)
– Competitive with the simplex algorithm in practice
– A strongly polynomial time algorithm is an open problem!
• Integer linear programming:
– A polynomial time algorithm exists if and only if P=NP.
– This result extends to many of the problems we model using ILPs
• E.g. there is a polynomial time algorithm for TSP if and only if P=NP.
• Much more in the course ”Combinatorial Search”!
Worst case polynomial time algorithms
for Linear Programming
The Ellipsoid algorithm (Khachian, 1979).
• Theoretical breakthrough.
• Algorithm far from being efficient in practice.
• Spurred new research into LP algorithms.
Interior Point algorithms (Karmakar, 1984).
• Algorithm efficient in practice.
• Tons of followup research and many interior
point algorithms developed.
• Best algorithms often beat the simplex algorithm.
Ellipsoid algorithm
Idea of algorithm:
• Enclose (possibly a subset of) all possible feasible
points within a large ball (i.e. ellipsoid).
• Test if center of ellipsoid is feasible.
• If not, the ellipsoid is partitioned into two parts
by a hyperplane, with all possible feasible points
in one part.
• Find a new smaller ellipsoid that encloses this
half-ellipsoid (pick the smallest), and repeat.
Finding a half-ellipsoid
Finding the new ellipsoid
Why this works
• Each iteration decreases volume by a factor
2−1/2 𝑛+1 .
• First ellipsoid will have volume less than
2𝑛2 22𝐿 𝑛 .
• If the system is feasible, the feasible region within
the ball 𝐵(0, 𝑛2𝐿 ) has volume at least 2− 𝑛+2 𝐿 .
• Hence at most 𝐾 = 16𝑛(𝑛 + 1)𝐿 iterations are
needed, since:
2𝑛2 22𝐿
𝑛 2𝐾/2(𝑛+1)
= 2𝑛2 22𝐿
𝑛 2−8𝑛𝐿
< 2−
𝑛+2 𝐿 .
Technical issue
• The feasible region may not be of full
dimension, and hence can be non-empty but
with volume 0!
• Solution: Replace the system 𝐴𝑥 ≤ 𝑏 with
𝐴𝑥 < 𝑏′, where 𝑏𝑖′ = 𝑏𝑖 + 1/(22𝐿 )
• Then: 𝐴𝑥 ≤ 𝑏 is feasible if and only if 𝐴𝑥 < 𝑏′
is feasible.
Ellipsoids
An ellipsoid in 𝑹𝑛 is defined as follow:
• Let 𝑄 be a non-singular 𝑛 × 𝑛 matrix, and let 𝑡 be
a vector.
• This defines an affine transformation
𝑇 𝑥 = 𝑄𝑥 + 𝑡.
• The ellipsoid determined by 𝑇 is the image 𝑇 𝑆𝑛
of the unit-sphere 𝑆𝑛 = 𝑥 𝑥 ⊺ 𝑥 ≤ 1 .
• 𝑇 𝑆𝑛 = 𝑥 ∣ 𝑥 − 𝑡 ⊺ 𝑄⊺ 𝑄 𝑥 − 𝑡 ≤ 1 .
• Matrix of form 𝐵 = 𝑄⊺ 𝑄 is called positive definite
matrix.
The algorithm
Input: System 𝐴𝑥 < 𝑏
1. let 𝑡0 ≔ 0, 𝐵0 ≔ 𝑛2 22𝐿 𝐼
2. for 𝑗 ≔ 1 to 𝐾 = 16𝑛 𝑛 + 1 𝐿 do
a)
b)
c)
d)
(Current ellipsoid is given by 𝐵𝑗 and 𝑡𝑗 )
if 𝐴𝑡𝑗 < 𝑏, return 𝑡𝑗
else, find 𝑖 such that 𝑎𝑖⊺ 𝑡𝑗 ≥ 𝑏𝑖
let 𝑡𝑗+1 ≔
1
𝑡𝑗 −
𝑛+1
𝐵𝑗 𝑎𝑖
𝑎𝑖⊺ 𝐵𝑗 𝑎𝑖
and 𝐵𝑗+1 ≔
𝑛2
𝑛2 −1
𝐵𝑗 −
Implementation issues
The algorithm involves taking a squareroot!
Solution:
• Compute with limited precision.
• In the analysis make each new ellipsoid slightly
bigger to account for introduced errors.
• Double the number of iterations to account for
larger volumes.
Unique feature of the Ellipsoid
Algorithm
• We do not need an explicit description of the
system of inequalities:
• We just need an efficient separation oracle.
– An algorithm that given an infeasible solution
finds a violated inequality.
• We can potentially solve systems with
exponentially many inequalities!
• This is utilized in many (theoretical)
applications of the Ellipsoid method.
Interior point algorithms
Idea of algorithm:
• A polyhedron has complex structure on the
border with exponentially many corners,
edges, etc. Navigating there takes time.
• Well inside the polyhedron there is no
structure. One can navigate directly towards
the optimum point without obstacles.
The linear program and its dual.
P:
D:
Maximize 𝑐 ⊺ 𝑥 subject to 𝐴𝑥 ≤ 𝑏, 𝑥 ≥ 0.
Minimize 𝑏 ⊺ 𝑦 subject to 𝐴⊺ 𝑦 ≥ 𝑐, y ≥ 0.
Add slack variables:
P: Maximize 𝑐 ⊺ 𝑥 subject to
𝐴𝑥 + 𝑤 = 𝑏, 𝑥, 𝑤 ≥ 0.
D: Minimize 𝑏 ⊺ 𝑦 subject to
𝐴⊺ 𝑦 − 𝑧 = 𝑐, y, z ≥ 0.
Conversion to barrier problem
Maximize 𝑐 ⊺ 𝑥
subject to 𝐴𝑥 + 𝑤 = 𝑏, 𝑥, 𝑤 ≥ 0
Change to:
Maximize 𝑐 ⊺ 𝑥 + 𝜇
𝑗 log 𝑥𝑗
subject to 𝐴𝑥 + 𝑤 = 𝑏
+𝜇
𝑖 log 𝑤𝑖
The central path
Lagrange multipliers
Barrier problem:
Maximize 𝑐 ⊺ 𝑥 + 𝜇
𝑗 log 𝑥𝑗
+𝜇
𝑖 log 𝑤𝑖
subject to 𝐴𝑥 + 𝑤 = 𝑏
Recall from Calculus: Lagrange multipliers allows
us to maximize functions subject to equality
constraints.
Result
After differentiations and manipulations:
𝐴𝑥 + 𝑤 = 𝑏
𝐴⊺ 𝑦 − 𝑧 = 𝑐
𝑋𝑍𝑒 = 𝜇𝑒
𝑌𝑊𝑒 = 𝜇𝑒
Notation:
• 𝑋 = 𝑑𝑖𝑎𝑔 𝑥1 , 𝑥2 , … , 𝑥𝑛
𝑒 is all-1 vector.
Existence of solution
Theorem: If primal feasible set has nonempty
interior and is bounded, then for each 𝜇 > 0
there is a unique solution (𝑥𝜇 , 𝑤𝜇 , 𝑦𝜇 , 𝑧𝜇 ) to the
system
𝐴𝑥 + 𝑤 = 𝑏
𝐴⊺ 𝑦 − 𝑧 = 𝑐
𝑋𝑍𝑒 = 𝜇𝑒
𝑌𝑊𝑒 = 𝜇𝑒
Path following algorithm
Given current 𝑥, 𝑤, 𝑦, 𝑧 > 0:
1. Estimate value for 𝜇.
2. Compute step directions (Δ𝑥, Δw, Δ𝑦, Δ𝑧)
pointing approximately at point (𝑥𝜇 , 𝑤𝜇 , 𝑦𝜇 , 𝑧𝜇 )
on the central path.
3. Compute step length Θ such that 𝑥, 𝑤, 𝑦, 𝑧 =
𝑥, 𝑤, 𝑦, 𝑧 + 𝜃 ⋅ Δ𝑥, Δw, Δ𝑦, Δ𝑧 > 0
4. Replace 𝑥, 𝑤, 𝑦, 𝑧 by 𝑥, 𝑤, 𝑦, 𝑧 and continue.
Key points
1. Balance making progress and making sure
boundary is possible to avoid at suboptimal
solutions.
2. Set up system of equations. Approximate
using by dropping non-linear terms. (This is
equivalent to Newton’s method).
Convergence
Repeat until current solution satisfies:
Primal feasibility:
Dual feasibility:
Complementarity:
(up to a tolerance)
𝑏 − 𝐴𝑥 − 𝑤 = 0
𝑐 − 𝐴⊺ 𝑦 + 𝑧 = 0
𝑧⊺𝑥 + 𝑦⊺𝑤 = 0
Convergence theorem
Essentially:
• Primal and dual infeasibility is decreased by
factor 1 − 𝑡 in each iteration.
• Complementarity is decreased by factor
1 − 𝑡 in each iteration.
Further technical details omitted.
Unique feature of interior point
algorithms
• The polyhedral structure of the set of feasible
solution is largely irrelevant.
• What we need is a set of manageable
equations expressing strong duality.
• We have strong duality for smooth convex
non-linear optimization problems in general.
• This is utilized in many (practical) applications,
such as support vector machines in machine
learning.