Download Finding the closest pair of points Problem definition Notation

CS 545 Finding the closest pair of points Samir Khuller, Yossi Matias: A Simple Randomized Sieve Algorithm for the Closest-Pair Problem Inf. Comput. 118(1): 34-37 (1995) Alon Efrat Problem definition Notation Given: A set S={p1,…pn} of n points in the plane Problem: Find the pair pipj that minimizes d(p,ipj), where d(pi , pj) is the Euclidean distance between pi and pj . p1 Let Si={p1, p2, … pi} Let d( Si ) denote the distance between the closest pair in Si Clearly d(S2 ) ≥ d(S3 ) ≥ d(S4 ) ≥ …. ≥ d(Sn ) Idea – incremental algorithm – compute d(Si+1 ) from d(Si ) p1 p8 p8 d(S3)/2=d(p1, p2)/2 In this talk – a randomized algorithm whose expected running time is O(n) Properties of Γ( Si ). Let Γ( Si ) denote an axis-parallel grid, where the edge-length of each grid-cell is d( Si )/2, and one of its corner is on the point (0,0) Locating points. p1 p8 p9 p2 p3 p3 O(n2) time algorithm – trivial Ω( n log n ) bound for any deterministic algorithm. p1 p4 p2 p2 p4 p3 Γ(S3) d(S3)/2=d(p1, p2)/2 Claim 1: there is at most one point of Si inside every cell of Γ( Si ). Proof – if there are two, then the distance between them is smaller than the length of the diagonal of the cell, which is ( 2 )d ( Si ) / 2 = d ( S i ) / 2 < d ( S i ) p8 p9 p2 p4 p3 d(S3)/2=d(p1, p2)/2 Claim 2: given d(Si) we can place all points of Si in a data structure H (Si), such that we can (in O(1) expected time) 1) insert a new point pj 2) Given a query point q find if there is a point of Si in the cell of Γ( Si ) containing q. The structure H(Si) is described in HW1. Deciding if d(Si)>d(Si+1) Rehashing with d(Si) p1 p8 p9 p2 p1 p4 d(S3)/2=d(p1, p2)/2 i=3 p3 d(S3)/2=d(p1, p2)/2 Procedure Rehashing with d(Si) : Construct the hash table H(Si) (from HW1) with d(Si), and inserting all points of Si into the table. Expected time O(| Si|). To decide whether d(Si)>d(Si+1) or d(Si )=d(Si+1) do find all points of Si in the cell containing pi+1… and in all the cells whose distance from this cell < d(Si) Measure the distance from pi+1 to each of these points. Note – only a constant number of cells, and due to Claim 1, only a constant number of points. Altogether: (expected) constant time . Algorithm – version 1 Algorithm – version 2 Create random permutation of the points of S before calling the algorithm of version 1. p1 Input: S Output: d(S), The closest pair of S p4 p2 p3 p2 p4 p3 Find d(S2), and construct H(S2). For i=2,4…n-1 do { Use H(Si) to decide whether d(Si)>d(Si+1) or d(Si)=d(Si+1) If d(Si)=d(Si+1) then Γ ( Si+1 ) = Γ ( S ) ; insert pi+1 ; and H( Si+1 ) = H( Si ). No work is needed. (constant time) Else /*d(Si)>d(Si+1)*/ rehash with d(Si+1). (O(i) expected time) p1 Assumption: the closest pair is unique. p6 p2 ap 7 p4 b p6 p3 p5 Claim 3: The probability that d(Si)>d(Si+1) is 2/(i+1). Proof: There are (i+1) points, two are special (determining the closest pair. All permutations are equally likely, so the probability that one of the special pair appears last in the permutation is 2/(i+1). } Running time: Worst case 1+2+3+…(n-1) = O(n2) Finishing the analysis So in the i’th stage we are spending O(1) time with probability (i-1)/(i+1), and O(i) time with probability 2/(i+1), so the expected work in this stage is O(i) 2/(i+1) = O(1). Expected time Let Ti denote the expected time at stage i. Then Ti=1 with probability (i-1)/(i+1) and Ti=i with probability 2/(i+1) ∞ E (Ti ) = ∑ i Pr(Ti = i) = 1 Pr(Ti = i) + i Pr(Ti = i) = 2 Hence the total expected time is O(n) . This argument can be done more formal using random variables. i =1 n Hence the total time is n n i =3 i =3 E (∑ (Ti )) = ∑ E (Ti = i ) = ∑ 2 = O (n) i =3

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Finding the closest pair of points Problem definition Notation