Download Finding the closest pair of points Problem definition Notation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS 545
Finding the closest pair of
points
Samir Khuller, Yossi Matias:
A Simple Randomized Sieve Algorithm for the
Closest-Pair Problem
Inf. Comput. 118(1): 34-37 (1995)
Alon Efrat
Problem definition
Notation
Given: A set S={p1,…pn} of n points in the plane
Problem: Find the pair pipj that minimizes d(p,ipj), where d(pi , pj)
is the Euclidean distance between pi and pj .
p1
Let Si={p1, p2, … pi}
Let d( Si ) denote the distance between the closest pair in Si
Clearly d(S2 ) ≥ d(S3 ) ≥ d(S4 ) ≥ …. ≥ d(Sn )
Idea – incremental algorithm – compute d(Si+1 ) from d(Si )
p1
p8
p8
d(S3)/2=d(p1, p2)/2
In this talk – a randomized algorithm whose expected running time is
O(n)
Properties of Γ( Si ).
Let Γ( Si ) denote an axis-parallel grid, where the edge-length of each
grid-cell is d( Si )/2, and one of its corner is on the point (0,0)
Locating points.
p1
p8
p9
p2
p3
p3
O(n2) time algorithm – trivial
Ω( n log n ) bound for any deterministic algorithm.
p1
p4
p2
p2
p4
p3
Γ(S3)
d(S3)/2=d(p1, p2)/2
Claim 1: there is at most one point of Si inside every cell of Γ( Si ).
Proof – if there are two, then the distance between them is smaller
than the length of the diagonal of the cell, which is
( 2 )d ( Si ) / 2 = d ( S i ) / 2 < d ( S i )
p8
p9
p2
p4
p3
d(S3)/2=d(p1, p2)/2
Claim 2: given d(Si) we can place all points of Si in a data
structure H (Si), such that we can (in O(1) expected time)
1) insert a new point pj
2) Given a query point q find if there is a point of Si in
the cell of Γ( Si ) containing q.
The structure H(Si) is described in HW1.
Deciding if d(Si)>d(Si+1)
Rehashing with d(Si)
p1
p8
p9
p2
p1
p4
d(S3)/2=d(p1, p2)/2
i=3
p3
d(S3)/2=d(p1, p2)/2
Procedure Rehashing with d(Si) :
Construct the hash table H(Si) (from HW1) with d(Si), and
inserting all points of Si into the table.
Expected time O(| Si|).
To decide whether d(Si)>d(Si+1) or d(Si )=d(Si+1) do
find all points of Si in the cell containing pi+1…
and in all the cells whose distance from this cell < d(Si)
Measure the distance from pi+1 to each of these points.
Note – only a constant number of cells, and due to Claim 1, only
a constant number of points. Altogether: (expected) constant
time .
Algorithm – version 1
Algorithm – version 2
Create random permutation of the points of S before calling the
algorithm of version 1.
p1
Input: S
Output: d(S), The
closest pair of S
p4
p2
p3
p2
p4
p3
Find d(S2), and construct H(S2).
For i=2,4…n-1 do {
Use H(Si) to decide whether d(Si)>d(Si+1) or d(Si)=d(Si+1)
If d(Si)=d(Si+1) then Γ ( Si+1 ) = Γ ( S ) ; insert pi+1 ; and
H( Si+1 ) = H( Si ). No work is needed. (constant time)
Else /*d(Si)>d(Si+1)*/ rehash with d(Si+1). (O(i) expected time)
p1
Assumption: the closest
pair is unique.
p6
p2
ap
7
p4
b p6
p3
p5
Claim 3: The probability that d(Si)>d(Si+1) is 2/(i+1).
Proof: There are (i+1) points, two are special (determining the
closest pair. All permutations are equally likely, so the
probability that one of the special pair appears last in the
permutation is 2/(i+1).
}
Running time: Worst case 1+2+3+…(n-1) = O(n2)
Finishing the analysis
So in the i’th stage we are spending O(1) time with probability
(i-1)/(i+1), and O(i) time with probability 2/(i+1), so the
expected work in this stage is O(i) 2/(i+1) = O(1).
Expected time
Let Ti denote the expected time at stage i. Then
Ti=1 with probability (i-1)/(i+1) and Ti=i with probability
2/(i+1)
∞
E (Ti ) = ∑ i Pr(Ti = i) = 1 Pr(Ti = i) + i Pr(Ti = i) = 2
Hence the total expected time is O(n) .
This argument can be done more formal using random variables.
i =1
n
Hence the total time
is
n
n
i =3
i =3
E (∑ (Ti )) = ∑ E (Ti = i ) = ∑ 2 = O (n)
i =3
Related documents