Download Fastest Isotonic Regression Algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Choice modelling wikipedia , lookup

Coefficient of determination wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Fastest Isotonic Regression Algorithms
Quentin F. Stout
[email protected]
www.eecs.umich.edu/ ˜qstout/
Version as of August 2014
Below are tables of the fastest known isotonic regression algorithms for various Lp metrics and partial
orderings. Throughout, “fastest” means in O-notation, not in any measurements of implementations. I’ve
cited the first paper to give a correct algorithm with the given time bound, to the best of my knowledge.
In some cases I’ve listed two if they appeared nearly contemporaneously. I’ve only included algorithms for
exact calculations (to within numerical accuracy), not approximations. I think I’ve included all orderings of
practical interest. Feel free to contact me about corrections or updates.
This is a list of the fastest known algorithms, not an historical review nor a survey of applications. Several
of the papers listed below have extensive references within them. I’ve also omitted related topics such as
unimodal regression, integer-valued regression, isotonic regression with constraints on the number of level
sets or the differences between adjacent ones, etc. I might include some of these in the future if there is
sufficient interest. No parallel algorithms are considered since regrettably there has been no interesting work
in this area, even though they would be useful for large data sets. Contact me if you know of such algorithms,
or want to fund their development.
A directed acyclic graph (DAG) G with n vertices V = {v1 , ..., vn } and m edges defines a partial order
over the vertices, where vi precedes vj if and only if there is a path from vi to vj . It is assumed that G
is connected, and hence m ≥ n − 1. If it isn’t connected then the algorithms would be applied to each
component independently of the others. A function z = (z1 . . . zn ) on G is isotonic if whenever vi precedes
vj , then zi ≤ zj . By data (y, w) on G we mean there is a weighted value (yi , wi ) at vertex vi , 1 ≤ i ≤ n,
where yi is an arbitrary real number and wi , the weight, is ≥ 0. For 1 ≤ p ≤ ∞, given data (y, w) on G, an
Lp isotonic regression of the data is a real-valued isotonic function z over V that minimizes
P
( ni=1 wi |yi − zi |p )1/p 1 ≤ p < ∞
maxni=1 wi |yi − zi |
p=∞
among all isotonic functions. The Lp regression error is the value of this expression.
The orderings listed in the tables are linear (also known as total), tree, points in multidimensional space,
and general (i.e., an algorithm that applies to all orderings). A DAG of points in multidimensional space
is the isotonic version of multivariate regression. In d-dimensional space (the “dim” orderings), point p =
(p1 , . . . , pd ) precedes point q = (q1 , . . . , qd ) iff pi ≤ qi for all 1 ≤ i ≤ d. In some settings, q is said to
dominate p. In the tables the multidimensional orderings are further subdivided into regular grids and points
in arbitrary positions, and into dimension 2 and dimension ≥ 3. They are subdivided like this because there
are different algorithms that can be used in these cases. The constants in the O-notational analyses depend on
d but in general the papers do not explicitly determine them.
The metrics considered are L1 , L2 , and L∞ . These metrics also go under a variety of other names: L1 is
also known as Manhattan or taxi-cab distance, median regression, least absolute deviation; L2 is squared error
regression or Euclidean distance; and L∞ is also known as minimax optimization, uniform metric, Chebyshev
distance, supremum, or maximum absolute deviation. Some results for general Lp appear in [12].
The tables list the best time known for the given ordering and metric, and citations to the references where
the algorithm is described, or to a relevant note. All times are worst-case.
1
linear
tree
2-dim grid
2-dim arbitrary, note d)
d ≥ 3 grid
d ≥ 3 arbitrary, note d)
arbitrary
weighted
time
reference
Θ(n log n)
[1, 9]
Θ(n log n)
[12]
Θ(n log n)
[12]
2
Θ(n log n)
[12]
2
Θ(n log n)
A
d
2
Θ(n log n)
A
2
Θ(nm + n log n)
[2]
unweighted
time
Θ(n log n)
Θ(n log n)
Θ(n log n)
Θ(n log2 n)
Θ(n1.5 logd+1 n)
Θ(n1.5 logd+1 n)
Θ(min{nm+n2 log n, n2.5 log n})
A: Result implied by that for arbitrary DAG
W: Result implied by that for weighted data
Table 1: L1 . See Comment 4.
time
reference
linear
Θ(n)
note a)
tree
Θ(n log n)
[7]
2
2-dim grid
Θ(n )
[8]
2
2-dim arbitrary, note d)
Θ(n log n)
[12], note c)
3
d ≥ 3 grid
Θ(n log n)
A
d
3
d ≥ 3 arbitrary, note d)
Θ(n log n)
A
2
3
arbitrary
Θ(n m + n log n) [6, 12], note b)
A: Result implied by that for arbitrary DAG
Table 2: L2 , no improvements known for unweighted data. Also see Comment 3.
weighted
unweighted
time
reference
time
reference
linear
Θ(n)
[14]
Θ(n)
A
tree
Θ(n)
[14]
Θ(n)
A
2-dim grid
Θ(n)
[14]
Θ(n)
A
2-dim arbitrary, note d)
Θ(n log n)
[14]
Θ(n log n)
A
d ≥ 3 grid d
Θ(n)
[14]
Θ(n)
A
d ≥ 3 arbitrary, note
Θ(n logd−1 n)
[14]
Θ(n logd−1 n)
A
arbitrary
Θ(m log n)
[5, 10], note f)
Θ(m)
note e)
A: Result implied by that for arbitrary DAG
Table 3: L∞ . See Comments 4, 5.
2
reference
W
W
W
W
[13]
[13]
W, [12]
Notes concerning the tables:
a) For linear orders the “pool adjacent violators”, PAV, approach has been repeatedly rediscovered. Apparently the first paper that used this is by Ayer et al. [3]. For the L2 metric it is trivial to implement
in linear time, and similarly for the L∞ metric with unweighted data, while for the others appropriate
data structures are needed. Previously the fastest known algorithm for weighted L∞ used this approach,
taking Θ(n log n) time, but now the fastest takes Θ(n) time and is not based on PAV [14].
b) Maxwell and Muckstadt’s algorithm [6], with a small correction by Spouge, Wan, and Wilbur [8],
takes Θ(n4 ) time. The algorithm in [12] takes Θ(n2 m + n3 log n), which is an improvement when
m = o(n2 ).
c) For 2-dimensional points with arbitrary placement, [12] shows how to use a balanced tree to simulate
the 2-dimensional grid algorithms for the L1 and L2 metrics to obtain the indicated results.
d) For points in arbitrary position and d ≥ 3 (and d = 2 for the L∞ metric), the results are based on
embedding them into a DAG which has more vertices but which can represent the domination ordering
using relatively few edges. The best result for general sparse DAGs is then applied to this new DAG.
This was first used in the original version of [10], but was subsequently moved to [13]. The new DAG
has Θ(nn logd−1 n) edges and vertices, and can be constructed in time linear in its size. The time
analysis for L2 uses the fact that the number of iterations needed is linear in n, rather than linear in
the size of the new DAG. For L1 and L2 the analyses also use the fact that a minimum cost flow step
in [2] uses a number of steps linear in the number of vertices with nonzero weight, which too is n rather
than the size of the new DAG. If one instead used the natural representation of domination ordering for
points in arbitrary position there would be Θ(n2 ) edges, and so the bounds for general dense orderings
would apply.
e) For unweighted data the time for L∞ is Θ(m) since the regression value at a vertex can be chosen to
be the average of the maximum value of any predecessor (including itself) and the minimum value of
any successor (including itself). This can be computed via topological sorting and has been repeatedly
rediscovered.
f) The algorithm in [10] is a modest improvement of the algorithm of Kaufman and Tamir [5], changing the
time from Θ(m log n + n log2 n) to Θ(m log n). This is faster for sparse graphs where m = o(n log n),
which is relevant for all of the other orderings considered. Unfortunately, the approach is based on
parametric search, which is decidedly impractical.
Comments:
1. Most of the entries have changed since I first posted this in 2009, partially because I happened to
develop some new algorithms. Many of these incorporate prior results that are referenced in the papers.
Please let me know of faster algorithms so that I can incorporate them.
The most recent major update to this document was in August 2014. Among other things, I changed
to pdf, instead of html, because it simplified keeping the cross-indexing up to date and the O-notation
looks better. I also updated most of the entries for weighted L∞ . I hadn’t planned on working on
them, and had assumed that the previous results were optimal (e.g., Θ(n log n) for the linear order), but
stumbled across a helpful approach [14] while working on a different problem.
3
2. While many isotonic regression algorithms have been published, it appears that few efficient, exact, ones
have been implemented. I’m as guilty of this as anyone. Some of them would require significant work,
and for weighted L∞ regression on an arbitrary DAG the result is purely of theoretical interest since
it relies on parametric search. I’m not sure which of the published ones would be faster in practice,
e.g., how large does n need to be before the Θ(n) algorithm for linear orders [14] is faster than the
simpler Θ(n log n) prefix approach [10]? However, some of the implementations I’ve seen are of
slower algorithms even when the one faster in O-notation is also likely to be faster in practice and about
the same difficulty to program. I might start providing pointers to good implementations, so if you have
some favorites please send me a pointer to them.
3. For DAGs other than linear and trees the fastest L2 algorithms use a recursive approach based on
splitting the vertices into those where the regression value is above the average, and those where it is
below. In the worst case, at each step either almost all vertices have a regression value larger than the
average, or almost all are less. This introduces a factor of n which in practice is likely to be closer
to log n. Reducing the time of the L2 algorithms, and implementing them efficiently, is the change of
most importance to people who want to use isotonic regression.
4. In general there are many optimal isotonic regressions when the L1 and L∞ metrics are used. A natural
“best best” regression is the limit as p → 1 or p → ∞, respectively, of Lp isotonic regression. This
doesn’t appear to have a name, so I’ve called it “strict isotonic regression”. The standard approach for
unweighted L∞ , setting the regression value at a vertex to be the average of the largest preceding data
value (including it) and smallest following one (including it), does not produce strict isotonic regression.
For example, the regression of data 1, -1, 0 would be 0, 0, 0.5, i.e., there is an unnecessary error at the
last vertex. The isotonic regression for all other Lp metrics would set the last value to 0, hence the strict
L∞ isotonic regression would do so as well.
For L∞ the fastest known algorithms for computing the strict regression take Θ(n log n) time for linear
and tree orderings and Θ(TC(n, m)) time for a general ordering, where TC(n, m) is the time required
to determine the transitive closure of a DAG of n vertices and m edges. The best bound known for
TC(n, m) is Θ(min{nm, n2.376 }). These algorithms appear in [11], along with an analysis of mathematical reasons for preferring strict isotonic regression.
Apparently no algorithms have been published for strict L1 isotonic regression. Most L1 algorithms,
when applied to the unweighted data 1, 0, 1, would produce 1, 1, 1 or 0, 0, 1, while the strict regression
is 0.5, 0.5, 1. PAV can be used to produce the strict L1 isotonic regression for linear and tree orderings,
but often can only produce an approximation accurate to within machine error. This occurs even if
the data is unweighted with only integer values. The difficulty has to do with determining the proper
median of, say, 10, 2, 1, 0 vs. 5, 2, 1, 0. In both cases the strict median is between 1 and 2, but it is
slightly larger in the former case. Jackson [4] gives a formula for the strict median.
5. Currently, all of the fastest algorithms for weighted L∞ isotonic regression use an indirect approach,
based on queries determining if there is an isotonic regression with error ≤ ǫ, and, if there is, producing
one. A search is used to find the minimum such ǫ. Unfortunately the results, while optimal, are not
always appealing. For example, using the algorithms as they are described, the unweighted data 1,
-1, 3 results in 0, 0, 2, while some users would prefer 0, 0, 3. The latter is the strict regression (see
Comment 4).
4
References
[1] Ahuja, RK and Orlin, JB (2001), “A fast scaling algorithm for minimizing separable convex functions
subject to chain constraints”, Operations Research 49, pp. 784–789.
[2] Angelov, S, Harb, B, Kannan, S, and Wang, L-S (2006), “Weighted isotonic regression under the L1
norm”, Symposium on Discrete Algorithms (SODA), pp. 783–791.
[3] Ayer, M, Brunk, HD, Ewing, GM, Reid, WT, and Silverman, E (1955), “An empirical distribution
function for sampling with incomplete information”, Annals of Math. Stat. 5, pp. 641–647.
[4] Jackson, D (1921), “Note on the median of a set of numbers”, Bulletin of the American Mathematical
Society 27, pp. 160–164.
[5] Kaufman, Y and Tamir, A (1993), “Locating service centers with precedence constraints”, Discrete
Applied Mathematics 47, pp. 251–261.
[6] Maxwell, WL and Muckstadt, JA (1985), “Establishing consistent and realistic reorder intervals in
production-distribution systems”, Operations Research 33, pp. 1316–1341.
[7] Pardalos, PM and Xue, G (1999), “Algorithms for a class of isotonic regression problems”, Algorithmica
23, pp. 211–222.
[8] Spouge J, Wan H, and Wilbur WJ (2003), “Least squares isotonic regression in two dimensions”, Journal
of Optimization Theory and Applications 117, pp. 585–605.
[9] Stout, QF (2008), “Unimodal regression via prefix isotonic regression”, Computational Statistics and
Data Analysis 53, pp. 289–297. www.eecs.umich.edu/ ˜qstout//abs/UniReg.html
A preliminary version appeared in “Optimal algorithms for unimodal regression”, Computing and Statistics 32, 2000.
[10] Stout, QF (2011), “Weighted L∞ isotonic regression”, submitted.
www.eecs.umich.edu/ ˜qstout//abs/LinfinityIsoReg.html
This is a major revision of the original version that was posted on the web in 2008. Some of the material
in that paper was moved to [13].
[11] Stout, QF (2012), “Strict L∞ isotonic regression”, Journal of Optimization Theory and Applications
152, pp. 121–135. www.eecs.umich.edu/ ˜qstout//abs/StrictIso.html
[12] Stout, QF (2013), “Isotonic regression via partitioning”, Algorithmica 66, pp. 93–112.
www.eecs.umich.edu/ ˜qstout//abs/L1IsoReg.html
[13] Stout, QF (2013), “Isotonic regression for multiple independent variables”, Algorithmica, to appear.
www.eecs.umich.edu/ ˜qstout//abs/MultidimIsoReg.html
[14] Stout, QF (2014), “L∞ isotonic regression for linear and multidimensional orders”, in preparation.
www.eecs.umich.edu/ ˜qstout//abs/LinftyIsoRegLinear.html
Copyright 2009–2014 Quentin F. Stout
5