Download Lecture Notes - McMaster Computing and Software

THE PROBLEM SOLVING PROCESS ABSTRACT DATA TYPES MATHEMATICAL MODEL INFORMAL ALGORITHM PSEUDO – LANGUAGE PROGRAM OR OTHER FORMAL DESCRIPTION 1 DATA STRUCTURES PROGRAM (Pascal, C, C++, ete.) DATA TYPE VERSUS ABSTRUCT DATA TYPE DATA TYPE: Set of values (or objects) ABSTRUCT DATA TYPE (ADT): Set of objects + a mathematical model with a collection of operations defined on the model. 2 DATA TYPE: Set of values (or objects). Fortran 77: LOGICAL INTEGER, REAL, CHARACTER/STRING Composite types: Array of Integers Array of reals Etc. Pascal: Basic types: integer, real, character, Boolean Composite types: array of integers Array of characters Etc. Record of integers/reals/characters Etc. Set of…. File of…  THERE ARE OPERATIONS ASSOCIATED WITH EACH TYPE  AGGREGATING TOOLS: array, record, file 3 C: Basic types: int, real, char Composite types: arrays, structures WHAT ABOUT POINTERS? Pointer can be treated as a data type, but usually it’s treated as a DATA STRUCTURING FACILTY. 4 ABSTRUCT DATA TYPE (ADT) Set of objects plus a mathematical model with a collection of operations defined on the model Example: List a1, a2 ,…, an LIST (of integers) is an ADT with the following operations: 1. Calculate the length of the list 2. Get the fist member of the list and return null if empty 3. Retrieve the member at position P and return null if P doesn’t exist 4. Locate X in the list 5. Insert X into the list at position P 6. Delete the member at position P 5 P = 1 L = 50, 2 60, 3 4 5 6 7 8 23, 47, 21, 39, 60, 40 1. LENGTH (L) = 8 2. FIRST (L) = 50 3. RETRIEVE (4, L) = 47 RETRIEVE (9, L) = null 4. LOCATE (60, L) = 2 5. INSERT (30, 5, L) gives the result: L = 50, 60, 23, 47, 30, 21, 39, 60, 40 6. DELETE (3, L) gives the result L = 50, 60, 47, 21, 29, 39, 60, 40 ALL OPERATIONS ARE ATOMIC EXCEPT FIRST SINCE: FIRST (L) = RETRIEVE (1, L) 6 EXAMPLE: 1. 2. 3. 4. ADT STACK (OF INTEGERS) Retrieve the top element Delete the top element (POP) Insert x at the top (PUSH) Test if the stack is empty  S= 27  40 32 1. TOP (S) = 27 2. POP(S) = results in  40 S= 32  3. PUSH (0, S) results in S= 3. EMPTY (S) = false 7 0 27 40 32 Example: ADT MATRIX (OF REALS) 1. Return number of rows 2. Return number of columns 3. Multiply matrices A and B 4. Add A and B 5. Compute the transpose of matrix A 6. Delete a rows/column 7. Add a row/column 8. Multiply matrix A by real number 6 OBSERVATIONS: 1. Domain of an operation may involve more than one ADT Type 2. Some operations are partial 3. Range of an operation may be a different ADT 8 A simple application of ADT – evaluation of arithmetic express a + b*c/d **e +f Algorithm: Value (x : expression); oprnd: STACK OF REALS optor: STACK OF CHARS x1, x2 : REAL i: INTEGER Initialize oprnd and optor for i:= to LEN (x) do case x[i] of real: PUSH (x[i], oprnd) char: if TOP (optor) < x[i] then PUSH (x[i], optor) else repeat x2: = TOP (oprnd); POP (oprnd); x1: = TOP (oprnd); POP (oprnd); x1: = x1 TOP (optor) x2; PUSH (x1, oprnd); POP (optor) until TOP (optor) < x[i]; PUSH (x[i], optor) end if endcase Value: = top (oprnd) 9 Comparison of ADT’s with procedure – the advantages 1. GENERALIZATION Procedures are generalization of primitive operations (e.g. +, -, *,….) ADT’s are generalizations of primitive data types. 2. ENCAPSULATION (OR MODULARITY) A procedure encapsulates all the statements relevant to a certain aspect of a program. An ADT encapsulates all the definitions and the operations relevant to a data type. How to implement an ADT? Note that a data structure doesn’t have to be associated with an ADT. 10 Data Structure: A collection of data objects connected in various ways. A data structure is always associated with a specific programming language. FORTRAN 77: the only data structuring facility is ARRAY PASCAL: we have: ARRAY, RECORD, and POINTER C: ARRAY, STRUCTURE, POINTER Some important terms: Cell: a box capable of holding a value drawn from some basic or composite data types (e.g. integer, record…) CELL IS BASIC BUILDING BLOCK OF DATA STRUCTURE Pointer: a cell whose value indicates another cell Cursor: an integer-valued cell, used as a pointer to an array 11 Example: A simple data structure is given below. It may be used in the implementation of ADT MATRIX. a11 a11 a11 a22 a22 a22 am1  am1  am1 A pointer 12     A CELL Type Cell type = record Element: real Down: cell type Right: cell type End 13 Example: A data structure below way be used in the implementation of ADT LIST 1.2 3 -1 ≡ uil pointer 1 cursor 0 ≡ uil pointer 3.4 0 2 5.6 2 3 7.8 1 4 A CURSOR L = 7.8, 1.2, 5.6, 3.4 4  2 Type Record type = Record Cursor = integer; Ptr: end 1.2 3 3.4 0 2 5.6 2 3 7.8 1 4 1 14 Record type ALGORITHM VERSUS PROGRAM  An algorithm is a finite sequence of instructions satisfying the following criteria. 1) Definiteness : - each instructions must be clear and unambiguous 2) Finiteness: - the algorithm will terminate after a finite number of steps for all cases. 3) Effectiveness:- each instruction can be performed using a finite amount of resource (time and space)  A (well-defined) program in principle is similarly described, but program: 1) is always associated with a specific programming Language 2) May not half (e.g. Operating systems) All the programs. We are interested in; half pseudo-Pascal is our chosen language WE WILL USE INTERCHANGABLE ALGORITHM 15 AND PROGRAM Examples: Proc search (x: integer; A : array [1…10] of integer) i=1; while x <> A[i] and I <= 10 do i:= i + 1; search: = i end proc print ( S: set) Print the elements of S end proc Pi print all the digits of Pi } never ends end 16 The running time of a program depends on 1) 2) 3) 4) Computer speed Compiler quality Input to program Program efficiency (or quality) The TIME COMPLEXITY of a program is defined as a function of input, usually the SIZE of input. 17 Program A is of worst case time complexity T(n) if the maximum running time of A on any input of size n is T(n). THE UNITS OF T (n) ARE UNSPECIFED. Although the constants in T (n) are important, we are more interested in the growth rate (or order) of T (n). e.g. 2n ≈ 10n + 1000 2n << n2 when n is large f (n) << g (n) ↔ lim f (n) n∞ g (n) 18 IMPORTANT DEFINITION T (n) is 0 (f(n)) if there are constants C and n0 such that T (n) ≤ C f (n) when n ≥ n0 Note : 0 (f(n)) actually denotes a class of functions of the same or slower growth rates, and it would be better to write T (n) Є 0 (f (n)) No 19 Stands for “is”, not “equals” Examples: 3n2 + 16n + 8 = 0 (n2) C=4 n > 17  n0 = 17 3n2 + 16n + 8 ≤ 4 n2 ------------------T (n) f (n) n logn = 0 (n2) n>0  n logn < n2, jo C = 1, n = 0 3n3 – 6n2 ≠ 0 (n2) V C Э n0 n > n0  3n3 – 6n2 > Cn2 If n > 0 THEN 3n3 -6n2 >Cn2  3n – 6 > C Hence if n > C + 6 then 3n3 – 6n2 > Cn2 3 20 k Σ ai n i = 0 (nk) when ak > 0 i= 0 106 = 0 (1) = 0 (2) 100n + 105 = 0 (n) n4 + n2 + n + 6 = 0(n4) 2n + n100 = 0(2n) 3n >> 0 (2n) log10 n = 0 (log2n) since log10n = log2n log210 0 (f(n)) is an upper bound of the at the growth rate order of T (n) if T (n) = 0 (f(n)) 21 To specify a lower bound, we use Ω. DEFINITION: T (n) is Ω (f(n)) if there is a constant C such that T (n) ≥ c f(n) infinitely of ten. ½ n + 100 = Ω (n) T (n) F (n)  ½ n + 100 > C n C=½ T (n) = n n2 /100 n is odd & n ≥ 1 n is odd & n ≥ 1 T (n) = Ω (n2) C = 1/100  T (n) ≥ C n2 for n = 0, 2, 4, 6, 22 WHY IT IS IMPORTANT? 5n2 2n n3/2 100n 3000 2000 1000 5 10 15 20 Running times of 4 programs 1000 jek ≈ 17 minutes 23 n  HOW LARGE A PROBLEM CAN WE SOLVE  SUPPOSE THAT WE NOW WE BUY A MACHINE THAT RUN 10 TIMES FASTER AT NO ADDITIONAL COST. THEN FOR THE SAME COST WE CAN SPNED 104 SECONDS ON A PROBLEM WHRE WE SPENT 103 SEC BEFORE Running time T (n) Max Problem size for 103 sec Max problem size for 104 sec Increase in Max problem size 100 10 100 1000% 5n2 14 45 320% n 3/2 12 271 230% 2n 10 13 130% THE 0 (2n) PROGRAMS CAN SLOVE ONLY SMALL PROGRAMS NO MATTER HOW FAST THE UNDERLYING COMPUTER IS. 24 THEOREM IF T1 (n) = 0 (f(n)) AND T2 (n) = 0 (g(n)) THEN T1 (n) + T2 (n) = 0 (max (f (n)), g (n)). PROOF THERE ARE c1, n1, c2, n2 SUCH THAT n ≥ n1  T1 (n) ≤ c1 f (n) n ≥ n2  T2 (n) ≤ c2 g (n) LET n3 = max (n1,n2). THEN n ≥ n3  T1 (n) + T2 (n) ≤ c1 f (n) + c2 g (n) ≤ (c1 + c2) max (f(n), g (n)). ENDPROOF HENCE: 0 ( f (n)) + 0 (g (n)) = 0 (max (f (n),g (n))) 0 ( n2) + 0 (n3) = 0 (n3) 0 (n2) + 0 (2n2) = 0 (2n2) = 0 (n2) 25 THEOREM IF T1 (n) = 0 (f (n)) AND T2 (n) = 0 (g (n)) THEN T1 (n) T2 (n) = 0 (f (n) g(n)) PROOF THERE ARE c1, n1, c2, n2 SUCH THAT n ≥ n1  T1 (n) ≤ c1 f (n) n ≥ n2  T2 (n) ≤ c2 g (n) LET n3 = max (n1,n2). THEN n ≥ n3  T1 (n) T2 (n) ≤ c1 c2 f(n) g (n) ENDPROOF HENCE: 0 ( f (n)) 0 (g (n)) = 0 (f (n) g (n)) 0 ( n2) 0 (n5) = 0 (n7) 0 (n2) 0 (2nh) = 0 (n22h) = 0 (2h+2logh) OTHER IMPLICATIONS f(n) f (n) Σ 0 (g (i, n)) = 0 ( Σ i=1 i=1 g (i,n)) max (0 (f(n)), 0(g (n)) = 0 (max(f (n)), g(n))) 26 0 (f(n)) = 0 (g (n)) * ASYMMETRIC! ↨ 0 (f (n)) ≤ 0 ( g(n))  ═ Means IS MY CAT IS BLACK ≠ BLACK IS MY CAT 0 : FUNCTION → SET OF FUNCTIONS N2 n2 2 1000 n + 5 0 DEF: 0 (f (n) 0 (f (n) == 0 (f (n) Any Operator +, ., ETC. 27 g (n)) 0 (n2) OTHER USEFUL RULES f(n) ═ 0 (f (n)) C 0(f(n)) ═ 0 (f(n)) 0 (f(n)) + 0 (f(n)) ═ 0(f(n)) 0 (0 (f(n)) ═ 0 (f(n)) 0 (f(n))0(g(n)) ═ 0 (f(n))g(n)) 0 (f(n)g(n)) ═ f(n)0(g(n)) REMEMBER: ═ HERE IS ASYMMERIC! 28 CALCULATING COMPLEXITIES OF ALGORITHMS Procedure : bubble (var A : array [1…..n] of int); BUBBLE SORT A INTO INCREASING ORDER Var i, j, temp: interger: Begin 1 for i: = 1 to n -1 do 2 for j:=n down to it 1 do 3 if A[j-1] > A[j] then begin 4 temp: = A[j-1]; 5 A[j-1] := A[j]; swap A[j-1] and A[j] 6 A[j] := temp end (3) – (6) TAKES 0(1) (2) – (6) TAKES (n -1) 0(1) + 0(1) = 0(n-1) (1) – (6) TAKES: n Σ i=1 n [0(n-i) + 0(1) = 0 ﴾Σ (n-i) = 0 (n(n-1)) i=1 29 2 ═ 0(n2-n) ═ 0 (n2) 2 Function test (m: integer) : Boolean; TESTS IF m IS A POWER OF 2, I.E. M═2K FOR SOME k Begin if m =1 then test:= true else if (m mod 2 = 0) then test:= test (m/2) else test:=false end LET T(m) = time complexity of test 1→C1 2 →C2 3→C1 4→C2 + T(m/2) 5→C2 T (m) = C1 +C2 2c1 + c2 m =1 m odd, m >1 2C1 + C2 + T (m/2) m even A recurrence equation 30 Define a new function: C 1 + C2 m ≤1 T’(m) = 2C1 + C2 + T’ (m/2) m > 1 Then T(m) ≤ T ‘(m) for all the m > 0 i.e. T ‘(m) is an upper bound of T (m). Note: T ‘(m) is defined for all real numbers. T ‘(m) = 2C1 + C2 + T’ (m/2) = 2(C1 + C2 )+ T’ (m/22) = 3(2C1 + C2 ) + T’(m/23 ) … = [log2m] (2C1 + C2) + T’ = ( 2C1 + C2 ) [log2m] + C1 + C2 = 0(logm) THUS: T’(m) = 0 (logm), T(m) = 0 (logm) 31 m 2[log2m] Worst case occurs when M= 2k Ceiliuy : is the smallest integer ≥ x [x] → e.g. [1.5] = 2 [3.1] = 4 [3.0] = 3 NOTE THAT: 2[log2m] ≥ m IF m = 2k [log2m] = k log2m = k, and 2[log2m] = m m=6 log24 = 2 & log2 8 = 3 → 2 <log26 < 3 → [log26] =3 >m 32 → 2[log2m] Problem: What is T (m) ? Is m the length of input! M is 100, 15, 64, etc, just number!!  IF m is BINARY and is the number of bits of m, THEN n = [log2m] And i.e. T (m) = 0 (log2m) → T (n) = T ([log2m]) = 0 (log2m) T (n) = 0 (n)  M CAN BE TREATED AS : 000….0, m I.E. m UNITS, THEN THEN “LENGTH” OF M IS m, and T (n) = T (m) = 0 (log n) 33 DESIGN OF A PROGRAM  TOP – DOWN / BOTTOM – UP APPROACH  STEPWISE REFINEMENT, COOSE ADT’S AND DATA STRUCTURES  CODING A REMARK ABOUT RUNNING TIME  ALTHOUGH THE ORDER OF RUNNING TIME IS VERY IMPORTANT, WE SHOULD ALSO CONSIDER THE FOLLOWING FACTORS IN PRACTIC. 1. THE TIME IT TAKES TO WRITE AND DEBUG THE PROGRAM 2. READABILITY, MODULARITY, ETC. HOW HARD IS TO MAINTAIN THE PROGRAM 3. SOMETIMES CONSTANTS ARE ALSO IMPORTANT 4. SPACE (OR STORAGE) COMLEXITY 5. ACURACY 34 ADT LIST A list is a sequence of zero or more elements of a given type (element type). L = a1, a2, a3, …., an length = n first = a1 last = a1 ai is at postion i ai1 precedes ai ai followa ai1 END(L) = position n+1 some data type Operations INSERT (x,p,l); DELETE(p,L); LOCATE (x,L); RETRIEVE(p,L); MAKENULL(L): L ← Є FIRST(L); NEXT(p,L); PREVIOOUS(p,L) PRINT(L); LENGETH(L); REVERSE(L) CONCAT(L1, L2); etc. EMPTY(L) 35 Array implementation of lists Last 1 a1 2 a2 list an empty max Const max =?; type position = 1..max; LIST = record elements: array [positions] of elements type: last: o ..max end; function END (var L: LIST) : integer; begin END : = L.last+1 end; 36 last 1 2 3 p a1 a2 a3 ap max an-1 an Procedures INSERT (x: elements type; p: position; var L:LIST); Var q: position begin if L.last = max then error (‘list is full) else if (p>L.last+1) or (p>1) then error else begin for q:=L.list downto p do L.elements[q+1]:=L.elements[q]; //shifting to the right// L.last:=L.last+1; L.elements[p]:=x End End; Time co,plexity: INSERT, DELETE, LOCATE – O(n) RETIEVE, NEXT, PREVIOUS END, FIRST, MAKENULL – O(1) Avg. time INSERT, DELETE, LOCATE – O(n) 37 Pointer implementation (linked list) Cell 0 cell 1 cell 2 a1 a2 .. Header list L Type Celltype = record Element : elementtype; Next: ↑ ceeltype End; LIST = ↑ celltype; Position = ↑ celltype; Position i : a pointer to cell i -1, 1≤ i ≤n+1 Function END(L.LIST) : position Var q: position begin q:=L; while q. ↑. Next < > nil do q := q. ↑. Next; END:=q End; 38 cell n an . LIST: record first: ↑ celltype; last: ↑ celltype end; Insert x at p …… time O(1) a ….. b p Delete cell at p Time O(1) … a c b …. p Time O(n) L Header p PREVIOUS (p, L) 39 INSERT, DELETE, RETRIVE, NEXT, FIRST, MAKENULL – O(1) PREVIOUS, LOCATE, END – O(n) Compare the two implementations 1. maximum size of the list – array 2. waste of space – both 3. operation speeds array INSERT DELETE PREVIOUS END 4. O(n) O(n) O(1) O(1) pointer O(1) O(1) O(n) O(1) or O(n) pointer representation can be dangerous! e.g. q:=NEXT(p,L); INSERT (X,P,L) . . . IF q=NEXT(p,L) then 40 P q DOUBLY – LINK – LISTS Cell 1 cell 2 a1 q≠NEXT(p,L)! cell3 a2 a3 Type Cell type = record Element: elementtype; Next, previous:  celltype End; Position:  celltype; Position I: a pointer to cell I Function LAST (L) Begin LAST : = L.previous End; WHAT HAPPENS IF POINTERS AREN’T AVAILABLE? USE CURSOR! 41 cell n an PATTERN MATCHING IN STRINGS A = {a1, a2, …. , ak } ALPHABET SYMBOL/CHARACTER A STRING x = a1, a2, …. , an n  0 , ai  A STRINGD A SPECIAL CAST OF LISTS PATTERN MACTHING: x = a1, a2, …. , an Pat = b1, b2, …. , bm Is pat a substring of x? i.e. ( I : 1  I  n – m +1) ai ai+1 …. = b1b2 …bm x = aabbbabbbaaa pat = bab 1234567891011 x = aabbabbbaaa pat = bab yes i= 4 pat = abab => No 42 SIMPLE ALGORITHM x = aabbabbbaaa pat = bab aabbabbbaaa bab NO aabbabbbaaa bab NO aabbabbbaaa bab YES! BUT FOR pat = aaa WE NEED TO MOVE FROM aabbabbbaaa TO aabbabbbaaa aaa aaa SIMILARLY for pat = abab, from 1234567891011 aabbabbbaaa abab TO aabbabbbaaa abab 8 = 11 – 4 +1 43 WORST CASE x = a1, a2, …., am am+1 …..an-m+1 …. an b1 b2…bm b 1 b 2… b m n-m+1 passes EACH PASS TAKES O(m) comparisons, HENCE (n-m+1) O(m) = O(m(n-m+1)) = O (mn) procedure find (x, pat : STRING; var found : Boolean; var i : position) Found is set to false if pat doesn’t occur int x, otherwise found is set to take and I is set to the first position in x where pat begins) Var p,q : position; Begin If not EMPTY (x) and not EMPTY (pat) then Begin Found: = false; i:= FIRST(x); while not found and i  END (x) do begin p:= i; q: = FORST(pat); while RETRIEVE (p,x) = RETRIEVE (q, pat) and not found do 44 begin p: = NEXT(p,x); q:= NEXT(q,pat) IF q= END (pat) then found : = true End; If not found then i:=NEXT(i,x) End; IF END(L) IS O(1) THEN T(n,M) = O(MN) END END THE KNUTH, MORRIS PRATT ALGORITHM ( KMP) X = abaababaabacabaababaabaab MISMATCH Pat= abaababaabaab 45 WHAT DO YOU DO NEXT? X= u w u c Pat = u w u a NEXT MOVE X= u w u w u c u a Start comparison X= abaababaabacabaababaabcab abaababaabaab math u abaababaaba u abaababaabacabaababaabaab abaaba baabaab start comparing & mismatch 46 abaaba abaababaabacabaababaabaab abaababaabaab START COMPARING & MISMATCH U aba u abaababaabacabaababaabaab abaababaabaab abaababaabacabaababaabaab ↕ abaababaabaab ↑ mismatch start comparing & mismatch abaababaabacabaababaabaab abaababaabaab COMPARING NUMBER OF COMPARISON IS O (n) BUT HOW TO FIND OUT WHAT IS U? 47 LET pat = b1b2 … bm OR EACH 1 ≤ j ≤ m, LET Largest i sud that 0 < i < j and b1… bi = bj –i+1 … bj f (j) = 0 if sud i does not exist f (j) < j FAILURE FUNCTION j 1 Pat = a f(j) 0 2 b 0 3 a 1 4 a 1 6 b 2 7 a 3 8 b 2 aba abaa abaab abaabab abaababa abaababaab abaababaaba abaababaabaa abaababaabaab 9 a 3 abaaba abaababaa 48 10 11 12 13 14 a b a a B 4 5 6 4 5 TIME COMPLEXITY: T (n, m) = O (n + complexity of defining g) = O (n + complexity of defining f) 0 f(j) = if j = 1 fs(j -1) +1 where s is the smallest I such that bfi(j-1) +1 = bj 0 if no such i exist f i (j -1) = f (f(… f(j -1)…..) i times 49 f 3 (j -1) = f (f (f (j-1))) T(j-1) T1 u j-1 a u a j f(j- 1) HERE f(j) = f (j -1) +1 j -1 U b u a i= f(j -1) j u j -1 a b a w w j f(i) = f (f(i-1)) = f2 (i-1) 50 f2(i-1) +1 proc fail (pat[1…m], vav f: away [1…m] of integer ) vav i, j = integer; begin f[1] : = 0; for j: = 2 to m do begin i:= f [j -1]; while (pat[j] ≠ pa[i+1} and i > 0) do i:= f[i]; if pat[j] = pat[i+1] then f[j]:= i+1 else f[j]:=0 and end T (m) = O (m) ! 51 Procedure KMP (x, pat, g, found, i); { x[1..n] , pat [1..m] are strings ; g[j] = g (j), 1 ≤ j ≤ m} var p, q : position begin if n ≠ 0 and m ≠ 0 then begin p:=1; q:=1; while (p ≠ n+1 and q ≠ m+1) do if x[p] = pat [9] then begin p: = p+1; q: = q+1; end else if q =1 then p:= p+1 else q: = g[q]; Time = 0(m) If q = m+1 then begin found : = true; i: = p-m end else found :=false end else …… end ; 52 ADT STACK “ LAST-IN-FIRST – OUT” LIST (LIFD) OPERATIONS: MAKENULL(S) : make stack S empty TOP(s) : Return the top element of s RETRIEVE (FIRST, S) TOP (s) = RETRIEVE (I, S) POP(S) : Delete the top element of S Sometimes POP is defined as function that returns the element being popped out DELETE (FIRST (s) , S) POP (s) = DELETE (I, S) PUSH (X,S); insert x at the top of S PUSH (x, s) = INSERT (x, I , s) INSERT (x, FIRST(S), S) EMPTY (s) : Return true if S is empty, false otherwise 53 A SIMPLE EXAMPLE: F: erase characters, if cancels the previous uncancelled character @: kill character, if cancels all previous characters on the line abc # d @ aa#b = ab Procedure EDIT Var S : STACK OF CHAR; C: CHAR Begin MAKENULL (S); Read (c); While not end ( c ) do Begin If c = ‘#’ then POP (s) Else if c = ‘@’ then MAKENULL(S) Else PUSH (c, S); Read (c) End; PRINT S IN REVERSE ORDER End 54 ARRAY IMPLEMENTATION OF STACKS TOP k 1 1 2 force K K 1ST ELEMENT 2ND ELEMENT max MAX LAST ELEMENT type : position = 1 … max; STACK = record Top : 1 .. max +1; Elements : away [position] of element type  PUSH, POP, TOP – O ( 1) 55 stack MORE SPACE – EFFICIENT IMPLEMENTATION POINTER IMPLEMENTAION Stack a b c .  MANY STACKS IN ONE ARRAY TOP 1 2 STACK 1 3 STACK 2 BOTTOM 1 2 3 STACK 3 Stack pace 56 tree ADT QUEUE A QUEUE IS A “First – in – First – Out” LIST > (FIFO) OPERATIONS: MAKENULL (Q); FRONT (Q) : return the first element of Q FRONT (Q) = retrieve (first (Q), Q) ENQUEUE (x, Q) : inserts x at the end of Q ENQUEUE (X, Q) = INSERT (X, END (Q), Q) DEQUEUE (Q): DELETES THE FIRST ELEMENT OF Q DEQUEUE (Q) = DELETE (FIRST (Q),Q) EMPTY (Q): 57 POINTER IMPLEMENTATION header a1 a2 … front near type celltype = record element : elementtype; next : ↑ celltype end; QUEUE = record Front, rear : ↑ celltype End; FUNCTION EMPTY (Q : QUEUE) : Boolean; Begin If Q. front = Q.rear then EMPTY: = true Else EMPTY : = false End 58 an EACH OPERATION – 0 (1) ARRAY IMPLEMANTATION FRONT TREE 1 1ST ELEMENT 2ND ELELMENT QUEUE REAR LAST ELELEMT MAX TREE 59 ENQUEUE – 0 (n) CIRCULAR ARRAY IMPLEMENTATION (BUFFER!) Max -1 max an real 1 …. . a2 . . a1 A1 60 2 HOW DO WE DISTRINGUSH BETWEEN FULL AND EMPTY  MAINTAIN AN EXTRA BIT  FULL ≡ (FRONT = addone(addone(real)))  Mark [i, j] = 1 if have been to (i, j) 0 otherwise  IF NO WAY OUT, BEACK UP ONE CELL AND TRY A DIFFERENT MOVE  MUST STRORE THE CURRENT PATH SOMEWHERE A PATH: (i, j), (i2, j2), …., (is, js) 61 STACK (is, js) (is-1, js-1) . . . . (iz, jz) (i, j) NW N (i-j, j -1) NE (i-1, j) (i-1, j+1) W (j, j-1) WS (i + 1, j -1) (i,j) (i, j + 1) (i+1, j) (i+1, j+1) Type offsets = record X: -1 …1 Y: -1 …1 End 62 E SE Directions = (N, NE, E, SE, S, SW W, NW); Var move : away [directions] of offsets d Move[d] .x Move[d].y N NE E SE S SW W NW -1 -1 0 1 1 1 0 -1 Var 0 1 1 1 0 -1 -1 -1 maze : array [0 : m+1, 0 … n + 1] of 0 …1 Mark : array [1..m, 1…n] of 0…1 63 Type: dir = (N, NE, E, SE, S, SW,W,NW,D) Type: elementtype = recond X: 1 … m; Y: 1 … n; Start: dir End STACK = …….. Var path : STACK Fuy NEXTMOVE (loc : elementtype) : dir Var d: dir; S, r, i, j: out Found : bool; begin i = loc.x; j := loc.y; d:=loc.start; found:= false; while d# D ∩ not found do Begin s: = move [d].x; r: = move[d].y; if maze [i + s, j+r] = 0 and mark [ i + s, j +r] = 0 then found : = true else d:= Succ (d) end; NEXTMOVE : = d 64 DEAD END end proc var rat ( var: maze [0 … m+1, 0 … n+1] of 0…1); mark: array [1 …m, 1…n] of 0…1; path : STACK; location : elementtype; d: dir function NEXTMOVE (X: elementtype): dir; ….; begin mark: = (0); MAKENULL (path); initialzation (should be specified last) location:= (1,1, E) ; mark [1,1] : = 1; PUSH (location, path ); While not EMPTY (pathy) do begin location: = TOP (path) ; POP (path); d: = NEXTMOVIE (Location) if d = D then begin location.start: = succ(d); PUSH (location, path); location.x : = location.x + move [d].x; location.y : location.y + move [d].y; if location.x = m and location.y = n then begin print(path); return end else begin PUSH ((location.x, location.y, N), path); Mar(location.x, location.y) =1 End End End 65 end end TIME COMPLEXITY OF RAT : O (mn) SPACE COMPLEXITY OF RAT :O (mn) BUT WITHOUT SING MARK O (8mn) = O (2mn) An application of queues - breadth-first search in trees tree T V1 V2 a1 a2 V3 a3 V4 V5 a4 V6 a5 a6 V9 V7 a7 V8 a8 V10 a9 66 a10 V11 a11 binary LEFT (v), ║ left child of V RIGHT(v) ║ right child of V e.g., LEFT (v3 = null, RIGHT (V3) = V6 DATA(V1) = a1 ROOT(T) = v1 Searching in tree Given tree T and data x, find a node v of T s.t. DATA(v) = X. Possible approaches: 1. Breadth-first search try level 1, then level 2, then level 3, ...etc. 2. Depth-first search: search along the leftmost path until the leaf is reached, then backup, try the 2nd leftmost path, ...etc. 67 Breadth-first X = 20 V1 10 50 V2 V3 5 V4 V5 2 V6 60 20 V12 V10 V7 4 V8 2 V9 V11 5 20 Searching v1, v2, v3, v4, v5, v6, . . . Depth-first Searching order: v1, v2, v3, v4, v5, v6, . . 68 7 30 Procedure DSearch(x,T) begin if x = DATA(ROOT(T))’ then PrintROOT(T) Else DSearch left subtree; DSearch right subtree; end; nonrècursive version procedure var DSearch (x,T); path : STACK of nodes v: node; begin v := ROOT(T); MAKENULL (path); PUSH(v,path); while not empty (path) do begin v := TOP(path); pop(path); if DATA(v) = X then Print v elsel PUSH (LEFT(v)); PUSH(RIGHT(v)); e end end Time: 0(n) Space: 0(n) Space avg: 0(Iogn) 69 // swap// Procedur BSearch(x,T) Var level : QUEUE of nodes; V : node; begin v := ROOT(T); MAKENULL(level); ENQUEUE(vjevel); while not empty (level); begin V := FRONT(level); DEQUEUE(Ievel); if DATA(V) = x then Print v; stop else begin ENQUEUE(LEFT(v), level); ENQUEUE(RIGHT(v), level) end end end; Time:O(n) n =/T/ --------------size of T Space: 0(n) Avg :0(n) 70 Application – implement a DOS command cd:\ Cd:\ name – change current directory to subdirectory name A: job letters study WP 5.0 letters project homework letters What should cd:\letters do? BFS DFS? When do we use DFS? e.g., solution tree 71 Proc. A(x1, x2,....) Var y1, y2, … Begin . . A(a1, a2, …) L1 …. …. Proc. A(x1, x2,....) Var y1, y2, … begin . . . . . Proc. A(x1, x2,....) Var y1, y2, … begin . . . . . A(b1, b2, …) L3 . . . . . . B(c1, c2, …) L3 . . . . . . . 72 Proc. B(x1, x2,....) Var f1, f2, … begin . . . . . . . . . . . . . Ellmination of Recursion Sometimes it is absolutely necessary to eliminate recursive • recursive calls are not supported e.g., FORTRAN • speed is the first priority - do it by yourself Solution: STACK of activation records Generally, an activation record holds 1. current values of the parameters (pass by value) 2. current values of the local variables 3. a label indicating return address Assume that if procedure p(x1, x2 …. var y1, y2, ….) then the recursive call is p(a1, a2, …, y1, y2, ….) 73 General Rules: Procedure P (x1, x2: int; var y: int); Var i, j: int; Begin ____________________________________; ____________________________________; . . . P(a1, a2,y) _________________________________________; _________________________________________; . . . 74 end; Example 1 procedure Ackerrnann (m,n:integer, var A:int); 1. 2. 3. begin if n<O or m<O then wnteln(“error”) else if m=O then A:=n+1 else if n=O then Ackermann (m-1 ,1 ,A) else begin Ackermann(m,n-1 ,A); Ackermanfl(m-i ,A,A); end end; 75 Recursion Elimination procedure Ackerrnann(m,n:int; var A:int); label 1,2,3; var S : STACK of record m, n, l:int end; 1:2.3; begin MAKENULL(S); 1: if n<O or m<O then write!n(uerrorx) else if m=O then A:=n+1 else if n=O then begin PUSH((m,n, 3), S); m:=m- 1; n:=1; goto 1 end else begin PUSH((m,n,2), S); n:=n-1; goto 1; 2. PUSH((m, a, 3), S); m:=m-1; n:=A; goto l; end; 76 3. if not EMPTY(S) then begin (m, n,1):=TOP(S); POP(S); case 1 of 2: goto2; 3: goto 3; end end end; {Ackermann} More details in [AHU] pp. 64- 69. [HS] pp. 150-153. * The method works only when  no pass-by-reference parameters  or, same p-b-r parameters are passed each time (e.g., function) General case??? POINTER!!! p(...,var x:type1)  p(...,xp:↑type1) 77 no global variables procedure R(x:integer var y,z: integer); var i: integer; begin --------------------- --------------------y:=x*i ------------------------------------R(a,i,y); ------------------- -----------------end; Trees 78 Basic Terminology 1 2 4 3 12 5 7 6 1. 2. 9 8 11 10 a single node is a tree, also the root. if T1,T2 are trees with roots n1, n2, …., nk. Then n nk T= n1 n2 T1 T2 Is a tree with root n. n1, n2, …, nk are the children of n. actually. A rooted tree or oriented siblings 79 TR a subtree of n(and of T) n is the parent of n1, n2, …, nk Note the every node (except root) has a unique parent. A node with no children is a leaf. A non-leaf node is also called an internal node. n1 n2 n3 nk n1, n2, n3,…., nk is a path of length k-1 from n1 to nk Note: n1 is a path of length 0 n1 is an ancestor of nk nk is a desendent of n1 height of n: length of the longest path from n to a leaf 80 height of a leaf is 0! depth of n: length of the unique path from root to n. depth of root is o! height (or depth) of a tree: height of root. Order of nodes in a tree, siblings are ordered from left-to-right (ordered) a ≠ a b c c b if n is to the left of n2 then all descendents of n1 are to the left of all descendents of n2 81 Tree Traversals n T T1 T2 Preorder traversal of T is n, DFS preorder traversal of T1 preorder traversal of T2 preorder traversal of Tk Inorder traversal of T is i.t. of T1 n, i.t. of T2 ..., i.t. of Tk Postorder traversal of T is p.t. of T1 p.t. of T2, ..., p.t. of Tk, n. ↑ evaluation of expression trees, divide-and conquer 82 TR Example 1 1 4 2 3 10 9 8 7 5 6 Preorder: 1, 2, 5, 3, 6, 7, 4, 8, 9, 10 Inorder: 5, 2, 1, 6, 3, 7, 8, 4, 9, 10 Postorder: 5, 2, 6, 7, 3, 8, 9, 10, 4, 1 Preorder: we list a node the first time we pass it Postorder: we list a node the last time we pass it Inorder: we list the first time, but list an interior node the second time we pass it 83 procedure Preorder (T:tree); var v: node; begin V := ROOT(T); Print v; for each subtree T of v, from left to right do Preorder (T) end; time complexity: O(|T|) ← number of nodes Pre/In/Post space complexity: 0 (height of T) ← stack 84 Procedure Preorder (T:tree); // no stack // var v: node; begin V := ROOT(T) while v ≠ null do begin print v; if v ≠ leaf then v := 1st child of v back up until v is not the last child of parent(v) else while v ≠ null and v = last child of Parent(v) do v := Parent(v); if v ≠ null then v := next sibling of v end end; time = O(|T|) if parent () is 0(1) space ÷? 85 Reconstructing a tree from its traversals  Preorder and Postorder traversals are sufficient.  Preorder and Inorder traversals aren’t sufficient. a a b c b e d c d Inorder and Postorder traversals aren’t sufficient. example trees? Any single traversal isn’t sufficient. (pre/in/post) 86 e Labelled Trees, Expression Trees n1 * + + n3 n2 a a b n4 n5 c n6 n7 n2 represents a+b n3 represents a+c n1 represents (a + b) * (a + c) Evaluation can be done by a postorder traversal. pre/in/post-order listings give prefix (Polish), infix, postfix (Reverse Polish) ↑ *+ab+ac ↑ a+b*a+c ↑ ab+ac+* 87 ADT TREE 1. PARENT (n,T).: node. If no parent return null node. 2. LEFTMOST-CHILD (n,T) : node 3. RIGHT-SIBLING (n,T): node returns the sibling immediately following n. 4. LABEL (n,T): label ≡ DATA(n,T) 5. ROOT(T) : node 6. MAKENULL(T) 7. CREATEL (v1, T1, T2 Ti ): tree; i=O,1,2,... v n T1 Ti T2 Alternative: ATTACH (T1 T2 ) : tree 8. DELETE (n,T) - delete the subtree rooted at n. 88 a1 n1 a2 a5 n2 a3 a4 n3 n4 a7 n5 a9 a6 a8 n6 n6 n8 LEFTMOST-CHiLD (n1, T = n2 ) RIGHT-SIBLING (n1, T) = n4 RIGHT-SIBLING (n7, T) = ^ procedure PREORDER (n:node); //list labels of descendents of n in T (global) in preorder!! var begin print LABEL (n,T); n := LEFTMOST-CHILD (n,T) while n ≠ ^ do begin PREORDER (n); n := RIGHT-SIBLING (n,T) end end; 89 n9 Array Implementation a a 1 3 b 2 b 4 c b a 9 5 6 10 8 a c 7 b 1 2 3 4 5 6 7 8 9 10 parent (10, T) 0 1 1 2 2 5 5 5 3 3 0=^ a b a c b a b c a b label node i is to the left of node j then i < j i.e., number siblings from left to right e.g., preorder, even inorder type node = 1 .. max cell = record parent : 0 … max; label : labletype end; 90 THREE = array [1… max] of cell function LEFTMOST-CHILD(n:node; T:TREE):node; var i:integer begin i : =1; while i ≤ max and T [i] . parent ≠ n do i := i+1; if i > max then LEFTMOST-CHILD := 0 else LEFTMOST-CH := i time: O(|T|) end; function RIGHT-SIBLING(n:node; T:TREE):node; var i:integer parent:node; begin time O(|T|) parent := T[n].parent; i := n+1; while i ≤ max and T [i]. parent ≠ parent do i := i+1; if i > max then RIGHT-SIBLING := 0 else RIGHT-SIBLING := 1 end; 91 Trees as lists of children Label children node right sibling 1 2 . . . . . . 3 4 5 6 7 8 9 10 6 node space type node = 1 .. max LIST = … TREE = record header : array [1..max] of LIST; labels : array [1..max] of labletype root : node end; no matter how LIST is implemented, LEFTMOST-CHILD; RIGHT- SIBLING _ 0(1) PARENT – 0(|T|) If want 0(1) for all, add parent field 92 7 Considering CREATE (n, T1, T2, …, Ti); node space 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 T1 T T2 A 6 C 4 . 12 . . B 10 G I D F E . H . . 2 . . 11 8 14 I A E T1 T2 B C F D 93 G H Simplified Leftmost=child & right-sibling representation A C B D Leftmost Child label right siblings 3 8 B 5 5 0 C 0 7 8 3 0 A D 0 0 Var cellspace : array [1..max] of record Label : labeltype; Leftmost-child, right-sibling:0 .. max End 94 SUMMARY 1. Array of Parents • PARENT--O(1) • LEFTMOST-CHILD, RIGHT-SIBLING - O(|T|) ALL-CHILDREN — 0(m) • simple, space-efficient 2. List of Children • LEFTMOST-CHILD - 0(1) • PARENT, RIGHT-SIBLING -- 0(|T|) • can store several trees, CREATE 3. Leftmost-child, Right-sibling • LEFTMOST-CHILD, RIGHT-SIBLING -- 0(1) • PARENT — O(|T|) • make tree, CREATE, slightly more space than (2) 95 BINARY TREES  A node is a binary tree  If T is a binary tree, v is a node, then V V T T If T1, T2 binary trees, v a node then V T2 T1 A binary tree is NOT a tree!!! A A B ≠ 96 B Binary Trees  A child is either a left or right child  Binary tree are not really trees full binary tree: every internal node has two children and leaves have the same depth complete binary tree: obtained from a full binary tree as follows: fix a leaf and delete all the leaves to the right of it • no. of nodes of depth i ≤ 2i i size of a binary tree of depth i ≤ Σ 2i = 2i+1 -1 j=0 If complete, 2i -1 < size ≤ 2i+1 -1 If full, size = 2i+1 -1 size-1 ≥ Depth ≤ log2 (size +1) -1 97 Binary tree traversals v T1 T2 Preorder (T): V, preorder (T1), preorder (T2) Inorder (T): * Inorder (T1), v, inorder (T2) v T2 Postorder (T): Postorder (T1), postorder (T2), v 98 How to reconstruct a binary tree from its traversals?  Just Preorder (or inorder or postorder) traversal is not enough.  Preorder & Postorder aren’t enough! a a b b Preorder and a1, a2, ….an 1. Inorder b1, b2, …, bn Find i s.t. a1 = bi Then T1 = Reconstruct (a2, … ai,bi, …., bi -1) T = Reconstruct (ai+1, …., an, bi+1, …, bn) a1 T= T1 T2 Posorder & inorder similar 99 Representation of binary trees A B . D . C . . . E . Type node = record label : labeltype; left, right : ↑ node end TREE ↑ node; Notes: 1. cursors may also be used. 2. if operation PARENT ( ) is crucial, a parent field could be included. 3. but if traversal is the only concern, then the parent field is not really needed. 100 F . procedure PREORDER(T:TREE); var temp, tempparent, tempchild; procedure BACKUP; var stop : boolean; begin // find the successor of temp in preorder traversal// stop false; temp:=tempparent; while temp ≠ nil and not stop do begin if temp ↑. tag = 0 then begin tempparent := temp ↑. left; temp ↑ .left := tempchild; if temp 1. right ≠ nil then begin tempchild := temp t right; temp ↑. right := tempparent; temp ↑. tag := 1 temparent : = then temp := tempchild; stop:= true; return; end else begin // tempt. tag = 1 // tempparent := temp ↑ .right; temp ↑. right := tempchitd end; tempchild := temp; temp := tempparent end end; // end of backup // 101 Begin // print nodes of T in preorder // temp := T; tempparent := nil; while temp ≠ nil do begin Print temp ↑. label; if temp ↑. left ≠ nil then begin tempchild := temp ↑ . left; temp ↑. left := tempparent; temp ↑ . tag := 0; tempparent := temp; temp := tempchild end else if temp ↑ . right ≠ nil then begin tempchild := temp ↑ . right; temp ↑. right := tempparent; temp ↑. tag := 1; tempparent := temp’ temp := tempchild end else //temp ↑ .left = temp ↑. right = nil // BACKUP end end; {end of PREORDER} 102 Threaded binary trees 0 0 0 0 . 1 1 lefttag righttag = = 0 1 1 1 0 1 → left = leftchild →left = leftthread (predecessor) in inorder 0 1 → right = right child right thread (successor) predecessor/successor in inorder can be found without using stack or flipping 103 . . Representation of complete binary trees A 1 2 B C 3 4 8 H 5 D 9 E 1 2 3 4 5 6 T F 6 10 G 7 7 8 9 10 11 12 13 A B C D E F G HI J ← largest integer ≤ i/2 parent of node i = [1/2], 1<i≤n left child of node I = 2i, 2i ≤ n right child of node I = 2i +1, 2i +1 ≤ n type THREE = record n : 0 .. max; labels : array [1..max] of labeltype end; 104 A B D C . . . . E . . H . Var temp, tempparent, tempchild : ↑ node; tag = 0→ left points to parent 1→ right points to parent type node = record label : labeltype; left, right: ↑ node; tag:0..1; end; TREE = ↑ node; 105 F . G . An a of binary trees - Huffman codes characters : {a1,a2, …. Ak} = A string or message : x1, x2,….xn є A p(ai) - the probability that a will appear in a message Encoding: assign a binary code c(ai) for each ai c(x1, x2…xn) = c(x1)… c(xn) Decoding: given code bib find the unique message x1,x2….xn such that C(x1,x2… xn) = b1b2 …bm Average code length: k Σ p(a1).| c(a1) | i =1 | c(a1)| : length of c(a1) 106 character a b c d e probability code 1 .30 .10 .10 .10 .40 average 000 001 010 011 100 3 code 2 code 01 0010 0011 000 1 00 01 10 000 1 2.1 3 1.7 Prefix property: c(ai ) is not prefix of c(ai ) for any j ≠ i e.g., Code 1 and Code 2 have prefix property, Code 3 doesn’t! Claim : prefix property makes decoding easy e.g., comsider Decoding code 000 …. Code 1 a … on-line decoding Code 2 d… Code 3 ?? 107 Huffman Code - an optimal (least average code length) prefix code Algorithm Huffman ({a1, a2, …. an}); //find Huffman code c(a for each ai // Let a and a be two characters such that p(ai) and p(aj) are the lowest among a1, a2, …., an; Let a be a new character and p(a’) = p(ai) + p(aj); Huffman ({a1, a2, …, an} - (ai,aj} + {a’}); c(ai) = c(a’) 0; c(aj) = c(a) 1; end; Example: {a,b,c} Hufiman ({a,b,c}) p(a) = 0.5, p(b) = 0.3, p(c) = 0.2 c(a)=0, Huff man ({a,[bc]})=> c(a) = 0, c([b]) = 1 108 c(b)=10, c(c)=11 Binary tree representation of prefix code 0 1 1 0 0 0 0 0 e a d 0 1 0 1 0 a b c d e code 1 0 b 1 c code 2 type node = record left, right :↑ node; probability : real ; character : {a1, a2, …., ak) end; used only in leaves A more efficient implementation is given in [AHU] pp.94 -101 109 example 1) a .10 b .20 c .05 d .05 e .10 .10 .20 .05 .05 a b c d .10 .20 .10 .10 f .30 g .10 h .10 .10 .30 .10 .10 e f g .30 .10 .10 h 2) a b e .05 f g h .05 c d (3) & (4) .20 .20 called a forest a .10 .10 .05 .05 c d 110 .10 .10 e g .20 .30 b f .10 h (5) .20 .20 called a forest a .10 .10 .10 .10 e g .05 c .30 d .30 .30 f .20 .10 b h (6) .20 .40 .10 .10 .10 .20 g e .10 a .05 c .05 d 111 (7) .20 .40 .10 .10 .10 .20 g e .05 .10 a c .05 d .60 .30 .30 f .20 b .10 h 112 (6) 1 0 1 0 1 0 f 0 0 1 0 0 1 1 1 e g b h a c d using a modified preorder listing, we can print the Huffman codes for the characters (using a stack) Algotithm Huffman-Tree; // construct a huffman tree for characters a1,a2,….,an// var forest: array [1… max] of THREE; p:real; begin 113 for i:=1 to n do begin new(forest[i]); forest[i] ↑. left := nil; forest[i] ↑. right:= nil; forest[i] ↑. probability := p(ai); forest[i] ↑. Character := ai end; while forest contains more than one tree do begin i := index of the tree with the smallest prob.; j := index of the tree with the second smallest prob.; p := forest[i] ↑. prob + forest [i] ↑. Prob.; forest [i] := CR EATE2( (p,-) ,forest[i],forest[j]); Delete tree forest[j] End End 114 A set is a collection of elements/members Notes: 1. An element can be a set! 2. A set can be infinite or empty. 3. Usually (in this course), members of a set are of the same type. 4. Members of a set are different (otherwise, a multiset). 5. Members could be nearly ordered. A relation is a linear order on some set S (i) (ii) for any a + b in S. exactly one of a<b, a+b, a>b is true. (Trichotomy) for a,b,c in S, a<b, b<c ==> a<c (Transitivity) 115 Some notation: S = { a1, a2, …an} or S = (x|x satisfies condition?) e.g. {1,2,...,10} = (x|x is an integer and 1 ≤ x ≤ 10) Ø = {} Membership: x є S, x ∉ S inclusion: S1⊆ S2 S1⊈ S2 (subset) S1⊆ S2 iff S1 ≠ S2 and S1 ⊆ S2 superset S1⊇ S2 proper superset: S1⊇ S2 Union: S1∪ S2 {1, 2}∪ (2, 3)={1,2,3} Intersection: S1∩S2 {1, 2} ∩{1, 3} = {2} Difference: S1-S2 {1, 2} – {2, 3} = {1} 116 ADT SET  1. MAKENULL(S):  2. INSERT(x,S): S:=S∪ (x}  3. DELETE(x,S): S:=S-{x}  4. MEMBER(x,S):true iff x ∊ S  5. ASSIGN(A,B): copy B into A  6. EQUAL(A,B): true iff A = B  7. UNION(A,B,C):  8. INTERSECTION(A,B,C): C:=A∪B  9. DWFERENCE(A,B,C):  10. MERGE(A,B,C):  11. MIN(S): returns the minimum element In S assuming S is linearly ordered  12. FIND(X): disjoint A1,A2 ,…An - global find the unique A1 St. X ∊ A1  13. S:=ø C:= A∪B C:=A-B if 4∩B=Ø, C:=A∪B otherwise C undefined SIZE(S). *SUBSET(A,B) COMPLEMENT(A) 117 SET with Union, Intersection, Difference Example – data-flow analysis B1 1. 2. 3. GEN = {1,2,3} KILL = {4, 5, 6, 7, 8, 9} t: = ? p:= ? q:= ? 4. 5. read (p) read (q) B2 GEN = {4,5} KILL = {2, 3, 7, 8} q ≤ p? GEN = KILL y B3 GEN = {6} KILL = {1, 9} 6. t : = p B4 7. 8. P:=q q:=t GEN = {7, 8} KILL = {2, 3, 4, 5} B6 GEN = KILL = ∅ P mod q =0 B6 y Write (q) 9. t : = pmodq B8 GEN = KILL = ∅ B7 GEN = {9} KILL = {1,6} GEN[i] = {data definition in B1} KILL[i] = {d|d ∊ Bi & ∊ d ė Bi defining same var as D} 118 DEF1NE[i] {d|∃ a path B1….BiBi, such that d is the last definition of the variable defined d in the path } reaching definitions of Bi DEFIN = (1,4,5) DEFIN = (4,5,6,7,8,9) GEN[i]= {data definitions in Bi } KILL[i] = (data definitions not in B), but defining the same variables as GEN[i] DEFOUT[i] = {d|(same as in DEFIN[I] except “Bi…BiBi”)} leaving definitions DEFOUT[i] = (DEFIN[i] – KILL[i]) ∪GEN[i] DEFIN[i] = ∪ DEFOUT[i] Bi is a predeceasor of Bi, i.e. there is an arc from Bi to Bi) 119 Algorithm dataflow ( GEN;KILL; var DEFIN); Var begin temp SET; i = integer; changed : boolean; no. of blocks for i:= 1 to n do begin MAKENULL(DEFIN[i]); MAKENULL(DEFOUT[i]) end; repeat changed := false; for i:= 1 to n do begin DIFFERENCE(DEFIN[i], KILL[i], temp); UNION (temp. GEN[i], temp); If not EQUAL (temp, DEFOUT[i]) then ASSIGN (DEFOUT[i], temp); Change : = true; End; For i:= 1 to n do begin MAKENULL(DEFIN[i]); for each predecessor Bi of Bi do UNION(DEFIN[i], DEFOUT[i],DEFIN[i]) end; until not changed; end; 120 Example B1 B2 1. 2. read (x) read (y) 3. 4. x: = x+y z: = 10.0 GEN[1] ={1,2} KILL[1]= {3,5} GEN[2] = {3 4} KILL[2] = {1} x z? B3 5. GEN[3] = KILL [3] =Ø y :=x*z GEN[4] ={5} KILL[4] = {2} B4 DEFOUT[I] = (DEFIN[I] – KILL  GEN[I] DEFIN[I]=  DEFOUT[j] Bj is a predecessor of BI iteration DEFIN[1] DEFOUT[1] DEFIN[2] DEFOUT[2] DEFIN[3] DEFOUT[3] DEFIN[4] DEFOUT[4] Ø 1 Ø Ø Ø Ø Ø Ø Ø Ø Ø 1,2 1,2 3,4 3,4 Ø Ø 5 2 3 4 3,4 2,3,4 2,3,4 1,2 1,2,4 1,2,4 1,2 1,2,4 1,2,4 2,3,4 2,3,4 2,3,4 2,3,4 2,3,4 2,3,4 3,4 2,3,4 2,3,4 3,4 2,3,4 2,3,4 5 3,4,5 3,4,5 121 BIT- VECTOR IMPLEMENATION {A,B,….Z} S  {1,2,…N} 1 UNIVERSAL SET 2 I true iff i Є S const N =? Type SET = packed array [1..N] of boolean; Procedure UNIN (A,B: SET; var C:SET); var I = interger; begin for I:=1 to N do C[I]:=A[I] or B[I]; End; MEMBER. INSERT, DELETE, - O (1) MAKENULL, ASSIGN, EQUAL UNION, DIFFERENCE, INTERSECTION – O(N) EMPTY 122 N Linked –list implementation  most general, size id unlimited  efficient if the sets are ordering by “<”  in that case, a set is represented as a shorted list a1,a2…, an a1 <a2 < …..an Unsorted MAKENULL, EMPTY – 0(N) INSERT, MEMBER, DELETE, ASSIGN, 0(n) EQUAL, UNION INTERSECTION, DIFFRENCE – 0(nm) SORTED MAKENULL, EMPLTY –(1) INSERT, MEMBER, DELETE, ASSIGN, EQUAL, UNION, INTERSECTION, DIFFERENCE – O(n) OR O(n+m) Can be improved to O(logn) If balanced search trees are used 123 where ADT Dictionary SET with IINSERT, DELETE, MEMBER, and MAKENULL Example Dean’s list data base Program deanlist(input, output); Type name = packed array[1…20] of char, Grade = -1 ..12 Var student :name; Average: grade; Database: DICTIOARY (of names) Begin MAKENULL(database); Readln(student, average); While student# ‘’ do begin Case avergage of 12..10 : INSERT (student, database) 9: 8.. 0 : DELETE(student, database) -1 : if MEMBER(student, database) then writein(‘yes’) else writeln(‘no’) endcase end end 124 A modified dictionary Type Elementtype = record Key : keytype; Data : datatype End; Then MAKENULL, INSERT, DELETE QUERY (x:keytype) : datatype; INSERT((key,data), dictionary) DELETE(KEY, dictionary) QUERY(key, dictionary) 125 Implementation of dictionary 1. Bit-vector if the universal ser is {1,2,…} INSERT, DELETE, MEMBER – O(1) 2. Sorted or unsorted o(n) INSERT O(n) DELETE or MEMBER O(logn) INSERT O(n) DELETE MEMBER – o(n) If set is ordered 1. Unsorted array (of some constant size) Type DICTIONARY = record Last : 0..max+1 Data : array [1..max] of element type end; Procedure MAKENULL (var : A DICTIONARY) Begin A last :=0 End; 0(1) 126 Function MEMBER(x:elementype; varA:DICTIONARY):boolean; Var i : integer; Begin For i:=1 A.last do O(n) if A.data[i] = x Then return (true); Return(false); End Procedure INSERT (x:elementtype; varA:dict…); Begin If not MEMBER(x,A) then If A.last <max then begin O(n) A.last := A.last +1 A.data[A.last]:x End Else error(‘full’) End; Proceudre DELETE(x,A);; Var i:= integer Begin Find the i s.t A.data[i] =x O(n) or I>a.last; If A.data[i] = then A.data[I]:=… 127 Hashing – O(1) time/operation on avg INSERT. DELETE, MEMBER Universal set To represent set S: put ai in cell I if ai Є a1 a2 a3 a4 . . . . an-1 an 0(1) time if rank (a) = i is o(1) time Generally, partition the elements into groups and let all elements in a group share a cell O(1) time if h(x) = I if x in a group I can be computed in O(1) time. Perfect if elements from a same group do not occur simultaneously! Good if it is unlikely two elements from a same group occur simultaneously! Okay if not TOO MANY elements from a same group occur simultaneously! Some hash functions: I mod p, sum of digits, h(135) = 9 128 Hashing Goal: O(1) / Operation of average INSERT, DELETE, MEMBER Pr(time > C) << 1.0 Open hashing buckets Partition elements in to B classes Hashing function h(x) = I if x Є class I 0 1 b-1 Bucket tables Headers list of elements in each bucket Avg time =1 + N/B per operation If N ≤ C*B., avg time ≤ 1+ C 129 Closed Hashing 0 1 B-1 Bucket table Insert: X x is placed in bucket h1(x) If bucket h1(x) is already taken collision Then try bucket h2(x) rehashing If bucket h2(x) is taken Then try bucket h3(x) . . . Member: X try bucket h1(x), h2(x) Until find it or an empty bucket is met 130 Example 2 Sorting using priority queue Key ≡ priority Pool: priority queue Procedure PQSort(var a array [1…n] of ….); Var pool:PRIORITY QUEUE of ….; I:Intger; Begin MAKENULL(pool); For i:=1 to n do INSERT(A[i], pool); For i:=1 to n do A[I]:= DELELTEMIN(pool) End; Obs: if INSERT, DELETEMIN- o(login), then PQSort – O(nlogn) 131 Previous implantation of sets Bit -vector – O (N) DELETEMIN Array - O (n) INSERT & DELETEMIN Linked list – unsorted O(n) DELETEMIN - sorted O(n) INSERT Hashing - DELETEMIN O(n) Solution – heap partially ordered tree in [AHU] 1 3 2 parent ≤ child 3 5 9 4 5 6 8 7 6 8 9 10 10 3 4 3 9 9 9 18 1 2 10 4 6 5 6 8 9 7 8 9 10 10 18 132 10 11 9 DELETEMIN: 9 5 9 6 8 9 10 10 18 5 9 9 6 8 9 10 10 18 5 6 6 9 9 8 9 10 18 Generally, time = O(depth of tree) = O(logn) (2depth ≤ n ≤ 2depth +1) 133 10 Insert (4, heap); 3 5 9 8 6 9 9 10 10 4 18 3 5 9 4 6 9 9 10 10 8 18 3 4 9 5 6 9 9 10 8 18 time = O(depth) = O(logn) 134 10 A linear order’<’ is a relation on elements; (i) for any two elements a and b a < b, a > b, or a = b (ii) a<b and b<c  a <c A set is ordered if a linear order’<’ on its members exists, e.g. sets of integers reals character strings (by lexicographical order) Note: the appearance order of elements in a set representation is unimportant, e.g. (1,3,4) = (3,1,4} = {4,3,1} A sorted list: a1,a2 a3 a4,…. an-1, an 135 Representing ordered sets – binary search trees Elements are ordered by ‘<’ Interested In operations: MAKENULL, INSERT, MEMBER, DELETE, MIN Previous implementation: sorted linked list: MEMBER - 0(n) sorted array: INSERT, DELETE - 0(n) Solution: binary search tree 20 30 15 25 left subtree < parent 45 parent < right subtree 10 17 28 16 136 Pascal Implementation type elementtype = record key:real; data:datatype end; nodetypes = (leaf, interior) I twothreernode = record case kind : nodetypes of leaf: (element:elementtype); nterior (first,second,third: ↑ twothreenode; lowofseconcd,lowofthird:real end; SET = ↑ twothreenode need parent: ↑ Twothreenode? 2-3 three: 3-way B-tree 137 AVL – tree (Adelson – Velskii, landis) Balance binary search tree [HS pp.436-452] AVL tree AVL tree dL dR I dL – dR 1≤1 Empty, single nodes are also AVL trees. An AVL tree is also called a height-balance (or depth) binary tree. BF = dL- dR 12 BF =1 -1 15 -1 7 0 19 1 2 10 0 0 8 138 nd: minimum number of nodes in an AVL tree with depth d. Fact n0 = 1 n1 =2 nd = nd-1 + nd-2 +1 Similar to Fd = Fd-1 + Fd-2 Fibonacci number nd ≥ Fd = Cd/5 c = 1+2 5 >1 d ≤ log cnd + logc 5 depth O(logn) MEMBER, INSERT, DELETE — O(Iogn) 139 Sets with MERGE and FIND MERGE(A,B,C): it A∩B =∅ then C:=A∪B environment = A1,A2,…,Am FIND(x): the unique Ai s.t. x ∊ Ai Example Equivalence problem: Equivalence relation ‘≡’ on set S 1. a ≡a (reflexivity) 2. a ≡b b ≡ a (symmetry) 3. a≡b, b≡c a ≡c (transitivity) e.g. congruence modulo K i ≡kj iff(i-j) mod K = 0 equivalence classes: S = S1, ∪ S2∪ S3 ∪... s.t. a,b ∊ si a≡ b a ∊ si a ∊ sj a ≢ b, i≠j e.g. (0,k,2k,…} {1,k+1, 2k+1,…} …{k-1, 2k-1,…} 140 s = {a1,a2,a3,a4,a5,a6,a5,a6,a7} Fortran: EQUIVALENCE . . . a11≡a12 a13≡a14 . . . { a1 } { a2 } { a3 } { a4 } { a 5 } { a6 } { a7 } a1≡a1 { a1, a2} { a3 } { a4 } { a5 } { a6 } { a7 } a5≡a6 { a1, a2} { a3} { a4} { a5, a6} { a7 } a3≡a5 { a1, a2} { a3, a5, a6} { a4} { a7 } a4≡a7 { a1, a2} { a3,a5, a6} { a4 a7} ai≡aj A = FIND(ai; B = FIND(aj ); MERGE (A,B,A); MAKENULL(B); ∪ ={ a1, a2, …., an } =A1∪A2 ∪…∪=Am A Partition ADT MFSET { A1, A2,… Am } component 1. MERGE(A,B): A:=A∪B or B:=A∪B 2. FIND(X) 3. INITIAL(A,x): A:={x} 141 A simple implementation element-based Type MFSET = array[membertype]of set-id-type ∪ = {1, 2, …, 12} 1 2 3 4 5 6 7 8 9 10 11 12 2 1 1 2 3 2 4 1 4 3 4 3 = {(2, 3, 6}, {1, 4, 9}} {5, 8,12}, {7, 10, 11}} type set-id-type = integer membertype = 1…n function FIND(x:1..n; var C:MFSET); Begin O(1) FIND := c(x) End Procedure MERGE (A, B:integer; var C:MFSET); // A: A∪B// Var X:1..n; Begin For x:=1 to n do If C[x] = B then O(n) C[x] :=A End; 142 By some minor improvement N merges can be done in O(nlogn) time using member list). A tree implementation component-based A B 1 1 C 5 7 1 6 1 A = {1, 2, 3, 4} B={5, 6} MERGE (A,B) A C= {7} 1 2 5 3 6 4 Time=0(1) 143 FIND (x) – O(depth) i i *weight rule: if we always merge the smaller tree into the large tree, then depth ≤ log2n  Root must conatin weight of the tree. Path compression 1 3 2 5 4 6 7 Find 6 8 1 3 3 4 144 5 6 7 8 With path compression only n consecutive FIND - O(n)time n Intermixed FIND and MERGE - O(nlogn) With both path compression and rule ( *) n intermixed FIND and MERGE - O(na(n)) α(n) = the least m s.t. n ≤ A(m,m) pseudo-Inverse of Ackermann’s ftn. In practice, α (n) ≤ 4 I since 2 A(4,4) = 2 . . 65536 . 2 2 145 Ordered Sets with MEREGE, FIND, SPLIT SLPIT (S, S1, S2, x): S1 := {a| a ∊ S and a < x} S2 : = {a| a ∊ S and a ≥ x} Longest Common Subsequence Problem Sequence = string e.g. abcdaaa A subsequence of a sequence x is obtained by removing zero or more (not necessarily contiguous) character from x e.g. ab, aaaa, ada are subsequences of the above sequence Longest common subsequence of x and y: A longest sequence that is a subsequence of both x and y 146 e.g. x=1214321 y=25134121 21421 is an LCS 21321 is another one Application : UNIX diff DNA analysis, ete. Solutions: x1, x2, …xn |x| = n 1. dynamic programming O(nm) y1,y2, …, ym |y| = m 2. O(plogn) Where p is the size of {(i,j) | ≤ n, 1 ≤i ≤n, 1≤i≤m, and x1 = y1} worst case p = O(mn) In practice p = O(m+n) 147 Key idea Input : A = a1a2…an B = b1b2…bm To find | LCS(A,B) | For j:=1 to m do Find | LCS( a1….ai,bi….bj) | Def. Sk = {i | |LCS(a1…aib1…bj)| = k A= 12345678 1214321 B= 25134121 s1 s2 J s0 s3 s4 s5 s6 s7 1 {1} {2,3,4,5,6,7} ∅ ∅ ∅ ∅ ∅ 2 {1} {2,3,4,5,6,7} ∅ ∅ ∅ ∅ ∅ 3 ∅ {1,2} {3,4,5,6,7} ∅ ∅ 4 ∅ {1,2} {3,4} {5,6,7} ∅ 5 ∅ {1,2} {3} {4, 5,6,7} ∅ 6 ∅ 7 ∅ 148 8 ∅ Def. PLACES (a) = {I| 1 ≤ i≤n, ai = a } All PLACES(a) can be obtained in O(n) time, assuming the alphabet is finite. If not, O(nlogn) e.g. if PLACES(a) = {i1,i2…., ik} i1 > i2 > ….ik PLACES [a] i1 ... i2 iR Hashing Intuitive fact: in iteration j (i.e. when considering bj), new matches happen at PLACES (bj) in A. These matches may have a position from sk to sk +1. Rule: r ∊ sr to sk+1 iff j -1 Move r to sk+1 iff 1. ar = bj (i.e., γ ∊ PLACES(bj)) 2. r-1 ∊ sk a1…ar-1 ar … b1 … bj-1 bj … 149 Procedure LCS: Begin Initialize s0 = [ 1, 2,…n} and Si = 0 for i=1,2,…n; for j:=1 to m do (1) (2) //compute sk’s for postion j // (3) (4) (5) (6) (7) for r in PLACES(bj) k:=FIND(r); if K = find(r -1) then begin SLPIT(sk, sk, s’k , r); MERGES(sk, sk+1, sk+1) End End End; Obs: if FIND, MERGE, SPLIT can be done in 0(logn), then the total time is m 0 ∑ | PLACES(bj).| logn j-1 0 = (p.logn) 150 Data structure for sets S0, S1, …, Sn 2 -3 trees ! ! ! 8 K 9 10 11 12 13 14 FIND ( r) : O(depth) = O(logn) MERGES (S’k, Sk+1, Sk+1): New Sk+1 S’k New Sk+1 Sk+1 Sk+1 S’k Similar to INSERT, repair Takes O(logn) time 151 S’k Sk+1 APLIT ( ) 6 7 8 9 10 11 12 r=9 split at 9 8 6 10 7 12 9 9 6 11 7 8 time = O(logn) 152 10 11 12 Graphs: A Math Model HW401 Hw401 Waterloo Toronto London HW6 QEW QEW Hw403 Hamilton Niagara falls Toronto Minneapolis New York Chicago New Orleans Miami Flight Map (Imaginary) KNOW Bob Mary Mary y Bob friends Alex Mark Alex Mark Sandy Sandy Misc: state transition diagrams 153 Directed Graphs (Digraphs) V1 = {1, 2, 3, 4, 5} E1 = {(1,2), (1,3), (2,3),(3,4),(4,5),(4,1),(5,1), (5,4)} G1 : = (V1, E1) 1 A digraph G = (V, E) 5 2 V: set of verices/nodes 3 4 E: set of arcs/directed edges The arc from vertex v to vertex w: (v,w) v≠w V➙w or Tail head w is adjacent to v |V| = n |E| ≤ n(n-1) = O(n2) A path v1,v2, …vm s.t. the arcs (v1,v2,(v2,v3),…,(vm-1,vm)exist. Length of the path : m-1 The path passes through v2, v3, …,vm-1 The path is simple if all vertices on the path are distinct, except possibly the first and last. 154 (Simple) cycle: a (simple) path of length at least one that begins and ends at the same vertex. e.g. 1, 2, 3, 4, 1, 3 is a path 1, 2, 3, 4, 5 is a simple path 1, 2, 3, 4, 5, 1 is a simple cycle 1, 2, 3, 4, 1, 3, 4 is a simple cycle labelled diagraph a b a b b b abab abbaaaba ... a a When the labels are numbers, the diagraph is also called a network or weighted diagraph. 155 Representation of diagraphs 1. List of edges e.g., (1,2), (1, 3), (2, 3),… 2. Adjacency matrix G = (V,E) V = {1,2, …, n} Adjacency matrix for G is an n x n Boolean matrix A[i,j] = true1 if (i, j) ∊ E = false0 otherwise space: 3. O(n2) even if | E | << o(n2) Adjacency list 2 3 4 1 1 1 2 3 4 5 5 156 . 3 . Space : O(|E|) to decide if i➙j, we need O(n) time ADT DIAGRAPH Single source shortest paths problem: Given G = (V,E) and source vertex 2 15 40 100 6 50 1 20 50 3 10 503 18 40 5 20 4 30 labels (costs) must be ≥ 0 costs (2,1) = +∞ We need to determine the cost of the shortest path from sources to every other vertex n-1 Cost(v1,v2,…vn) = ∑ cost(vi➙vi+1) i=1 e.g. source =1 to 2 min cost 70 157 3 4 5 6 60 40 10 30 Dijkstra’s algorithm Source vertex =1 G = (V,E) Distance V= {1, 2, 3} D(i) = cost of shortest path 1 to i Let S ⊆ V be a set of verticles, Ds (i) = cost of shortest path from 1 to i that only passes Through vertices in S S is called a restriction set 10 1 4 3 2 B(3) = 9 Ds(3) = 10 if S = {1} 5 Fact: Dv(i) = D(i) and D∅ (i) = cost(1➙i) Idea: Let S ⊆ V be some set s.t. 1∊ S Suppose we know Ds(i) for each i ≤ V. Then we can enlarge S as follows: 1. w ∊ V-S and Ds(w) is the minimum 158 2. S= S∪{w} 3. Ds(i): min(Ds(i),Ds(w) + cost (w, i)) i . . . S . . . . Ds(i) . | w . . Ds(w) Ds(x)≤ Ds(w) For any X ∊ S Algorithm Begin D[i]≡ Ds(i) S:= {1}; For i:=1 ton do D[i]:= cost(1,i); For i:= to n-1 do begin Find w in V-S s.t. D[w] is a minimum; S:=S∪{w}; For j:=2 to n do D[j]:=min(D[j], D[w] + cost(w, j)) End 159 . End; Obs. D[j] = D(i) if I ∊ S Ds(i) = D(i) for I ∊ S Thus no need to update D[i] if ∊ S Example 10 2 20 3 10 60 10 30 10 1 6 40 10 4 5 30 50 2 6 3 30 60 = d[3] 1 0 5 10 1 0 5 10 +∞ 40 6 3 2 30 4 3 60 230 60 4 1 0 +∞ 5 40 160 10 4 40 6 +∞ 6 3 2 30 1 1 0 5 10 6 3 2 30 +∞ 4 40 0 5 10 161 +∞ 4 40 Procedure Dijkstra: // C[i,j] = cost(i,j)// 1. 2. begin S := {1} For i :=2 to n do D[j] :=C[1, i]; For i := 2 to n-1 do begin Find a w in V-S such that D[w] is a minimum; S := S ∪{w}; for each vertex V in V-S do D[v] := min(D[v] + C[w,v]) End End; How to recover the Shortest paths? Time = O(n2) Adjacency lists of costs: W V1 C1 162 Priority Queue for V-S time: 0(|Ellogn) All-Pairs Shortest Paths Problem Given: a digraph with nonnegative arc costs Goal: for each pair v, w of vertices find the cost of the shortest path from v to w. Application: construction of shortest flying time table Solution 1: repeat Dijkstra’s algorithm with source = 1 ,2,...,n time: 0(n3) or 0(n | E | logn) Solution 2: Floyd’s algorithm let D(i,j) and Ds(i,j) be as before D(i,j): distance from i to j D distance from i to j under restrictions. 163 Floyd’s Idea: Let sk = {1, 2, …, k}, 0 ≤ k < n Suppose Dsk(i,j) is know for all 1≤ I, j, ≤ n Then, Let sk+1 = {1, 2, …k+1} Dsk+1 (i,j) = min Dsk(i,j) Dsk(I,k+1) + Dsk(k+1, j) For all i≤ j, j≤n Thus, we compute Ds0(i,j) Cost (i,j) Ds1(I,j),…., Dsn(i,j) D(i,j) 164 in the following procedure, A is an nxn matrix A[i,j] = Dsk,((i,j) after k-th iteration procedure Floyd(Var A:arraY[l ..n,1 ..n] of real; C:...); var i,j,k: integer, begin for i := i to n do for j:=1 to n do A[i,j] = C[i,j]; //A = Ds0// for i := 1 to n do A[i,j] :=0; for k := 1 to n do for i := 1 to n do for j:=1 to n do if A[i,k] + A[k,j] <A[i,j] then A[i,j] := A[k,j]+ A[k,j] // A[i,j]:= min(A[i,j], A[i,k] + A[k,j]) // end; time = 0(n3) 165 Recovering the paths Use an rrxn matrix P initially, P[i,j]:=0 1 ≤ 1,1 ≤n In procedure Floyd, insert red line If A[i,k] + A[k,j] <A [i,j]then begin A [i,j] := A[i,k] + A[k,j]; P [i,j] : = k end Meaning: the Shortest path from i to j passes through vertex k Procedure path (i,j:integer); // print a shortest path from I to j // var k: integer; begin k:= p[i,j]; if k ≠ 0 then begin path(i,k); writeln(k); path(k,j); end; // the path is not direct // end; 166 Transitive closure of adjacency matrix Given: digraph G = (V,E) represented by adjacency matrix C Goal: for each pair i,j, whether there exists a path from i to j A[i,j]= 1 true if  apath from 0 false other for i toj 1≤i,j≤n A is called the transitive closure of C Solution 1: Use Floyd’s algorithm. Initialize A[i,j] = + ∞ if C[i,j] = 0 A[i,j] =1 other At the end, set A[i,j] =1 if A[i,j] ≠ + ∞ = 0 if .. = + ∞ Solution 2: Simplified Floyd’s algorithms - Warshall’s algorithm in iteration k A[i,j] := A[i,j] or A[i,k] and A[k,j] 167 procedure Warshall(var A:array[1…n, 1…n] of boolean; C:...); var i,j,k: integer; begin for i := 1 to n do for j := 1 to n do A[I, j] := C[i,j]; For k := 1 to n do for j := 1 to n do for j := i to n do If A[i,j] = 0 then A[i,j] = and A[k, i] end; lime = O(n3) A=C + C.C + C.C.C +…. + Cn-1 •‘: boolean multiplication, i.e. and it is known C.C can be done in 0(n2.376) to obtain A, compute c2, c4, c8,… cn-1 log2n time : O(logn *n2.376) < 0(n3) 168 2 1 Undirected graphs G = (V,E) 3 5 4 V={(1,2),(2,3)(3,4),(4,5),(5,1),1,3),(2,5)} (u,v) and (v,u) denotes the same edge (u,v) is incident upon u and v. V1,V2,…Vn is a path if the edges (V1,V2),(V2,V3)…,(Vn-1,Vn) exist. The path V1,V2,…,Vn connects V1 and Vn Definitions for simple path and cycle and the same of length ≥ 3 2 G1 = (V1,E1) is a subgraph of G2 = (V2,E2) if V1 and E1 E2 1 3 If E1 contains all edges (u,v) in E2 Such that u,v V1, G1 is called an induced subgraph of G2 2 1 169 3 Graph G is connected if every pair of G’s vertices is connected by some path Connected component of G: a maximal connected induced subgraph of G G is cyclic if G contains at least one cycle G is acyclic if G doesn’t contain any cycles Free tree: a connected acyclic graph Fact: 1. Every n-node free tree has n-1 edges 3. If we add any edge to a free tree, we get a cycle Claim: If n>1, there must be a vertex with degree (i.e., number of edges incident upon the vertex) =1 170 Proof of claim Let G be a free tree with > 1 node Suppose that G’s nodes all have degree >1 V1 V2 V3 Vi Vi+1 Vi+2 ∴ a cylcle esists. A contradiction!! Proof of (1): true If n = 1 Suppose (1) is true for n = k Let G = (V,E) be a k+1 – node free tree Let u be a vertex of dgree 1 and (u,w) be the only indicent edge G’ = (V-{u}. E-{(u,w)}I a free tree By induction hypothesis, G’ has k-1 edges ∴ G has k edges proof of (2): if no cycle then the graph is still free tree but number of edges = n. contradiction!!! 171 Vj Representation Adjacency matrix: symmetric, i.e. entry i,j = entry j,i Adjacency list: redundancy, i.e. if edge (u,v) exists, then u is on the list for v and v is on the list for u. Minimum-cost spanning tree G= (V,E) is connected. Each edge (u,v) E has a cost C(u,v) (=C(v,u)). A spanning tree of G is a subgraph of G which is a free connecting all vertices in V. The cost of a spanning tree is the sum of the costs of edges in the tree. 11 30 3 8 20 15 20 13 30 172 The MST Property: Let G = (V,E) be a connected graph U ⊆ V is a proper subset of V If (u,v) is an edge of the lowest cost s.t. u ⊆U and v V-U, then there is a minimum cost spanning tree that includes (u,v) as an edge. u . .v u . .v’ u v-u C(u,v) ≤ C(u’,v’) procedure Prim (G:graph;var T : set of edges); II Constructs a minimum-cost spanning tree T II var begin U : set of vertices; U,V:vertex; T : = ∅ ; ∪ := {1} while ∪≠ V do being find a lowest cost edge (u,v) s.t. u ∊ U and v ∊ U; T := T ∪ ((u,v)}; ∪ := U ∪ {v} end end; 173 1 An example: 9 5 9 2 7 3 5 1 8 8 4 3 8 6 U = {1} 2 5 3 4 U = {1,2} 1 7 5 8 2 4 3 U = {1,2,5} 1 8 2 7 5 3 3 4 U = {1,2,3,5} 8 1 2 7 5 3 3 174 5 4 Kruskal’s algorithms a connected component w.r.t. T procedure Kruskal (G:graph;var T:set of edges); var u,v : vertex; E’ : set of edges; begin E’ := E; T := ∅ ; while E’ ≠ ∅ do begin find a lowest cost edge (u,v) in E’; E’ := E’ – {(u,v)}; If u and v are not in the same connected component then T:= T∪{(u,v)}; end end; K1 := FIND (u) K2 := FIND (v) if K1, ≠ K2 then MERGE (k1, k2) … α(n) time E’: PRIORITY QUEUE; T: MFSET; Time:O(eloge), e = | E| 175 1 Example: 9 5 8 9 2 7 3 5 8 4 1 3 6 add(3,5) to T 2 5 3 3 4 add (4.5) discard (3,4) 1 5 3 2 5 4 3 Add (2,5) 1 Discard(2,3) 2 7 5 3 3 5 4 Add (1,2) discard (1,3),(1,5) 8 1 2 7 5 3 5 3 4 176 Graph Traversal and Search Digraphs - depth-first search go as far as you can following the arcs! type digraph = array [1 ..n] of  adjacency list; vertex = 1..n var V : vertex mark: array (vertex] of (visited, unvisited); 0(e) for v.:= 1 to n do mark [V] := unvisited; for v := 1 to n do if mark [V] = unvisited then dfs (v); procedure dfs(v:vertex); var w: vertex; begin mark[v] := visited; Print v; // anything // for each vertex w on L[v] do I if mark [w] = unvisited then dfs(w) end; 177 3 1 Example 9 2 6 5 8 7 4 10 DFS order : 1, 2, 4, 7, 10, 5, 3, 6, 8, 9 Depth-first spanning forest: Dfnumber 1 7 1 5 2 2 3 6 8 4 7 5 6 4 10 forwars arc : ancestor ➙descendent (3,8) back arc: descendent ➙ ancestor (7, 1) Cross arc: all the other (7, 4) , (9,1) 178 3 tree arc 9 8 10 9 Fact: if (v, w) is a (1) tree/forward arc, dfnumber (v) dfnumber (w); (2) back/cross arc, dfnumber (v) dfnumber(w) An application- test for acyclicity Fact: a digraph is cyclic iff a back arc is encountered in any DFS. v Dfnumber (v) is the Smallest in the cycle . . w V1 V2 How to start a back arc? V3 In dfs, Include a dflNumber for each Node enoutered. Also, keep the current path in array. V4 V5 179 dfl 0 1 2 3 4 Breadth-first search go as broadly as possible procedure bfs(v); var Q: QUEUE of vertex x,y:vertex begin markivi := visited; print v // or anything // MAKENULL(Q) ENQUEUE(v,Q); while not EMPTY(Q) do begin x:= FRQNT(Q); DEQUEUE(Q); for each vertex y adjacent to do time = 0(e) if mark[y] = unvisited then begin mark[y] := visited; ENQUEUE(y,Q) End end; 2 BFS order: 1, 2, 5, 4, 6, 3, 7 1 5 3 bfnumber, bf spanning forest 4 7 6 180 (Undirected) Graphs DFS: very similar to DFS for digraphs. 7 5 2 8 10 4 1 9 3 6 DFS order: 1, 2, 4, 3, 6, 5, 7, 8, 10 , 9 dfs spanning forest 1 if connected df spanning tree 2 7 8 4 10 3 6 dfnumber(v): Tree edge: Back edge: (1,4), (4, 6), (2,5) No cross edges!!! 181 5 9 BFS: For the above graph, the BFS order is: 1,2,3,4,5,6,7,8,9,10 Applications of DFS and BFS: 1. Test for acyclicity acyclic 1ff no back edges 2. Test for connectivity connected 1ff only one tree In the DFS/BFS spanning forest generally, each tree in the forest gives a connected component. 3. Biconnected components (next lecture) 182 Articulation points and biconnected components Flight Map 1 2 3 5 4 8 7 6 Articulation vertex point : if removed the reaming graph becomes disconnected Def. A vertex v is called an articulation point or cutpoint if  vertices x,w st. x≠ v, w≠x, x≠ and v is inevery path connected x and w. Def. A connected graph is biconnected if it does not have any articulation points. 183 Fact: The following are equivalent: 1. 0 is biconnected 2. Deletion of any single vertex fails to disconnect 0 3. Every pair of vertices are connected by two disjoint paths (n ≥ 3) Def. A connected graph Is k-connected If deletion of any k-i vertices fails to disconnect the graph Def. A connected graph is k edge-connected if deletion of any k-i edges fails to disconnect the graph Biconnected component (or bicomponent): a maximal Induced biconnected subgraph, e.g. the above graph has 5 bicomponents 2 5 4 5 6 3 5 8 1 5 3 7 184 Problem Given a connected graph G, identify all its articulation points and bicomponents. - Trivial algorithm: 0(n*e). We want 0(e)! To identity the articulation points: Step 1 Do a depth-first search of G. Note: 1. a single df spanning tree 3. only tree and back edges 185 Fact: 1. A leaf cannot be an articulation point 2. The root is an articulation point 1ff it has more than one child 3. Let v be an interior node other than the root. v Is an articulation point iff some subtree of v has no back edge incident upon a proper ancestor of v. Obs. Let w be any proper descendent of v and (w,x) be a back edge. x Is a proper ancestor of v iff dfnumber (x) <dfnumber (v). Def. low(v) = the smallest dfnumber of v or of any node reachable by following a back edge from some descendent of v (including v itself). 186 1 2 dfnumber 6 3 4 Dfnumber 1 2 3 4 5 6 7 8 9 10 11 Low 12115166911 5 7 8 9 10 11 dfnumber (v) Low(v) = min Dfnumber(x) s.t. (v,x) is back edge from any x Low(y) for any child y of v Step 2 Traverse the df spanning tree in postorder and compute low(v) for all nodes v. Note : if v is a leaf Low(v) = min dfnumber (v) Dfnumber(x) s.t. (v,x) is back edge Step 3 Identify articulation points by traversing the tree in postorder. (This step can be In parallel with Step 2.) 187 v is an articulation point i for some child w of v low (w) ≥ dfnumber(v) Step 4 In Step 3, whenever an articulation point v Is found, delete the subtree rooted at w and output the bicomponent given by the subtree and V. A B E D C F G H A Df spanninf tree And back edges 2 B D F Time : O(e) Matching in Graphs 188 1 3 4 C 5 E 6 G 7 8 H Teachers course 6 1 7 2 8 3 9 4 5 G = (V, E) is a graph A matching in g is a set of edges with no two edges incident upon same vertex A matching is maximal if the number of its edges in the maximum. A matching is complete/ perfect if every vertex in V is an endpoint of some edge in the matching. G is bipartite if V = V1 ∪ V2, V1 ∩V2 = ∅ each edge in E had one end in V1 and the other end in V2 . Problem: Given a bipartite G, find a maximal matching in G. Solution #1: (Brute force) Enumereate all possible matchings. Pick one that the largest number of edges. Time: O(n!) = O(nn) Solution #2: Augmenting paths 189 Time: O(ne) M = {(2,7),3(3,6),(4,9)} Let M be a matching A vertex V is matched if it is an endpoint if an edge in M, e.g. 2,3,4,6,7,9 Are matched An augmenting path relative to M: a path connecting two unmatched vertices in which alternate edges in the path are in M. e.g. 1 6 3 5 9 4 10 190 P2 10 P1 Fact: if P is an augmenting path relative to M, then M ⊗ P is a bigger matching. e.g. M ⊗ P {(2,7), (1,6), (3,9), (4,10)} M ⊗ P = {(3,6), (2,7), (4,9), (5,10)} (⊗ is also the Exclusive —Or on sets, i.e. A ⊗ B (A-B) ∪ (B-A) symmetric difference) Fact: M is maximal iff there is no augmenting path relative to M. Proof: “only if”: straight forward “if”: i.e., if M is not maximal then there must be an augmenting path. Let N be a matching s.t. |N| >| M|. Then each connected component of (V,N ⊗ M) must be one of the following: equal 1. a simple cycle with edges alternating between N and M 1more from M 2. an augmenting path relative to N 1 more from N 3. an augmenting path relative to M 4. a path with equal number of edges from N and M Since N ⊗ M has more edges from N than M... 191 Algorithm M:= ⊗ Repeat Find an augmenting path P relative to M; M:= M ⊗ P Until no more augmenting path exists 1 6 2 7 3 8 4 9 5 10 192 M ⊗ P 1 {(1,6)} 3 {(3,6),(1,8)} 2 {(2,8),(1,6), (3,9)} {(2,8),(1,6),(3,9),(4,7)} 6 6 8 4 7 5 7 1 8 1 6 4 3 9 10 Algorithm to find an augmenting path relative to matching M //G=(V E) V=V1∪ V2 // Build an augmenting graph level by level as follows: level 0 := repeat unmatched vertices In V1 level 2i+1 := new vertices that are adjacent to a vertex at through an edge not in M; also add the edge; level 2i +2:= level 2i+2:= new vertices that are adjacent to a vertex at level 21+1 through an edge in M; also add the edge; Stop when an unmatched vertex is added at an odd level or no more vertices can be added (i.e. no augmenting path exists) The path from the vertex to any vertex at level 0 is an augmenting path. 193 Example V1 V2 1 6 7 2 8 3 9 4 5 10 11 12 Level 0 2 5 1 7 8 9 1 3 4 2 3 6 194 The process is very similar to BFS Time: O(e) if adjacency lists are used Internal Sorting Internal: data are stored in the main memory which is a RAM. Thus, access to each data item takes constant time. Data Item: a record with one or more fields. One field contains the key of the record. ‘≤’ linear-ordering on keys (compare ‘<) Sorting: arrange a sequence of records so that the keys form a nondecreasing sequence., r1,r2,…rn ri1-,ri2,…rin s.t. ri1 .key ≤ri2.key ≤…≤ rin .key 195 Bubble Sort Move the Hghter records to the top for i := 1 to no do for j:=n down to i+1 do If A[j].key < a[j-1].key then swap (A[j], A[j-1]) In place Time: 0(n2) Bad sequence: descending Insertion Sort Insert A[i] into A[1], A[2], ..., A[i-1] at its rightful position A[0].key := - ∞ For i:= 2 to n do begin j:= I; while A[j] A[j-1] to begin swap (A[j], A[j-1]); inplace j:=j-1 end end; Time: 0(n2) at descending sequence 196 Selection Sort Select the smaHest record and place it at its tightful position. for i := i to n-i do select the smallest among A[i],…A[n] swap it with A[i] Time: 0(n2) In place Better than Bubble when reconi is large Shell Sort (diminishing-increment) Incr = 6 Incr =3 Time: O(n3/2) in place for some incr sequence 197 Heap Sort Q: PRIORITY QUEUE for i:=1 to n do INSERT (A[i],Q); for i := n down to 1 do A[i] := DELETEMIN(Q); Time: O(nlogn) in place if Q is implemented using array A[l ..n] Details in [AHU] Quick Sort If A[i..j] contains two distinct keys then find the larger of the first two distinct keys, v (called pivot); arrange A[i..j] so that k, i+1 ≤j; A[[],…., A[k] < v and A[k+1],…,A[j] ≤; quicksort (i,k-1); quicksort (k,j) 198 A[i],…A[k] < v, and A[k+1],…,A[j] ≥ v Example 5 7 2 1 4 3 9 5 1 7 Partition v = 7 5 5 1 1 2 2 1 1 4 4 3 3 5 Partition V =5 3 3 1 2 1 1 2 1 4 4 5 3 1 2 1 2 3 4 3 4 V= 2 1 1 1 3 2 4 9 7 7 5 5 4 3 7 5 v =4 1 2 9 7 7 partition v = 9 Partition v = 3 1 5 199 7 7 7 9 9 Worst time complexity: 0(n2) pivot (i,j) - 0(j-i+1) partition (i,j,pivot) – O(j-i+1) T(n) = 0(n) + T(n-1) =... = 0(n not in place! stack space can be made to O(logn) Average time complexity Assumptions: 1. all orderings are equally likely 2. the keys are distinct Pr(lst group s of size i) = Pr(A[1] is the i ÷1 St smallest and A[1] Is the pivot) + Pr(A[2] Is the i+1 st smallest and A[2] is the pivot) = 1/n i/n-1 + 1/n i/n-1 = 2i/n(n-1) 200 Tavg(1) = C0 n-1 Tavg(n) ∑ ≤ 2i I =1 [Tavg(i) + tavg(n-1)] +cn n(n-1) n-1 ≤2 n-1 ∑ [Tavg(i) +t cn I =1 Suppose tavg(i) ≤ k I logi for some constant k 2 ≤ I < n n-1 Tavg(n) ≤ 2 n-1 ∑ k i logi +cn I =1 n/2 = 2k n-1 ∑ n-1 i logi ∑ + i =1 2k ∑ + cn i=n/2+1 n/2 = I logn n-1 i (logn-1) + ∑ 201 i logn + cn n-1 i =1 i=n/2+1 ≤ knlogn – kn/4 – kn/2(n-1) +1 ≤ knlogn, if k is large enough  Tavg(n) = 0(nlogn) 2-way merge sort • divide-and-conquer • can be used for external sorting • can be generalized to rn-way Algorithm Msort (A[l ..n]); if n> 1 then begin m:=[n/2]; Msort (A[1...m]); Msort (A[m+1..n]); Merge (A[1...m], A[m+l …n], B[1...n]); A[1...n] := B[1...n]; end; Let k=2[logn] (i.e. k is the smallest power of 2 that is greater than or equal to n) T(n) ≤ T(k) = T(k/2) + T (k/2) + ck = 2 T(k(2) + ck =4 T(k14)+ck+2 ck/2 if Merge takes 0(n) =8 T(k/8) + 3ck … = ck log2 k + k O(1) = O(nlogn) 202 The nonrecursive version Logn Recursive version logn NOTE : Merging order may be different in nonrecursive version. Type afile: array[1…max] of elementtype; 203 Merging two sorted lists Ι M m+1 i ι n j n k procedure merge (var X,Z :afile; l, m, n : integer); //Merge X[l…m] and X[m +1..n] into Z[l..n] // var i,j,k: integer; begin i:=1; j := m+1; k:=l while I ≤ m and j ≤ n do begin if x[i].key ≤ x[j].key then begin Z[k]:=x[i]; i:= i+1 End Else begin Z[k] :=X[j]; j:=j+1 End K := K +1 End; If i > m then 204 Union of ordered sets represented by sorted lists! Z[k…n] : = x[j…n] // move the reaming items// Else Z[k…n] : = x[j…m] End; Time: O(n-1) l Procedure onepass (var X,Y:afile; n;l:integer); //this procedure performs one pass of the merege sort. It merges adjacent pairs of segments of length l from list X to list Y. n= |X| // var i : integer; begin i =1; While i ≤ n -2l-1 do begin Merge (X, Y, i,l-1, i+2l -1); i:= I +2l; end; // merge remaining segments of length < 2 I // if (i+l-1) < n then Merge (X,Y,i+l-1,n) Else Y[i…n] := X[i…n] 205 End; Time : O(n) procedure Msort(var X:afile; n:integer); var l:integer Y:afile; Begin //l is the size of the segments currently being merged // l :=1; while l < n do begin Onepass(X,Y,n,I); l := 2*1; Onepass(Y,X,n,l); l =2*l; end; end; At most [1og2n] +1 passes. Each pass takes 0(n) time. Total: O(nlogn) 206 Example 3 5 6 4 5 93 7 2 8 4 6 1 X 1=1 3 5 4 6 5 9 3 7 2 8 4 6 1Y 1=2 3 4 4 5 5 6 3 5 7 9 2 4 6 8 1 X 1=4 3 3 5 6 7 9 1 2 4 6 8 Y 1 =8 1 2 3 3 4 4 5 207 5 6 6 7 8 9 Obs: The list X and Y are scanned sequentially from left to right [log2n] + 1 times Bin Sorting Is Ω(nlogn) the lower bound for sorting a elements? Yes, if we don’t make any assumption about the keytype and only use comparisons such as key l ≤ key2. What if we know 1 ≤ key ≤ n and the a elements have distinct keys? To sort such n elements, for i := 1 to n do B[A[i].key] := a[i]; 0(n) or for i:= to n do while A[i].key ≠ i do swap (A[i],A[A[i].key]); 0(n) 208 Example Sorting records that have a small number of distinct keys: n records, O(logn) distinct keys Can we do better than O(nlogn)? An algorithm using modified 2 - 3 tree: (AVL also okay) 5 6 4 2 7 4 9 5 6 209 7 11 9 11 . . . size of tree: O(logn) each insert: O(loglogn) Total time: O(nloglogn) Bin Sorting Key = 1..m (any finite and discrete type) 1 . 2 . . . m Bin table B Procedure binsort; Var i:= integer, v: keytype; begin for i = 1 to n do O(n) INSERT(A[i], END(B[A[i].key]), B[A[i],key]); 210 O(m) End; For v : = 2 to m do CONCAT(B[1],B[v]) Bin sorting when m = nk for some k Example k=2 keytype=0...n2 -1 Step 1: Place each integer i into bin i mod n Append Ito the end of the list for bin i mod n. Step 2: Concatenate the lists. Step 3: Place each integer i into bin [i/n]. Step 4: Concatenate the lists. Each step -- 0(n) Total time — 0(n) 211 n=10 Given: 45, 36, 21, 64, 60, 33, 12, 27, 30, 25 ImodlO= BIN CONTENTS 0 1 2 3 4 5 6 7 8 9 60,30 21 12 33 64 45,25 36 27 New list: 60, 30, 21, 12, 33, 64, 45, 25, 36, 27 BIN LW10J= 0 1 2 3 4 5 6 7 8 9 CONTENTS 12 21, 25, 27 30,33,36 45 60,64 212 Radix Sort type keytype = record f1: t1; f2:t2; . . finite, discrete fk: tk end; keyl = (a1, a2,…ak) key2 = (b1, b2,…,bk) keyl <key2 iff 1. a1 <b1, or 2. a1 = b1 and a2< b2, or k. a1 = b1 ,...,ak-l = bk-1, and ak < bk i.e.  i, 0 ≤ i <k s.t. aj = bj i ≤j ≤i and ai+1 < bi+1 e.g. abc <aca (called lexicographic order) 213 Var Bi:array[ti] of linked listtype; Procedure radixsort; // binsort list A, first on fk, concatenate the bins in Bk, binsort of fk-1,and so on // begin for i := k down to 1 do begin for each value V of type ti do make Bi[v] empty; for each record r on list A do move r from A on to the end of bin Bi[r.fi]; // binssort on fi // foe each value V of type ti, from lowest to highest do concatenate Bi[v] onto the end of A end end; k k i=1 i =1 Time :  o(|ti| +n) = o(kn+  |ti|) 214 Example A = hact, fact, sact, camp, duck, kuck, codd, less, more D E K P S T codd more sack, duck, kuck camp less hact, fact C D M V S sack, duck, kuck, hact, fact codd camp more less A E O U sack, hact, fact, camp less codd, more duck, kuck C D F H K L M S camp, codd duck fact hact kuck less more sack 215 Odd-even merge sort (Useful when you have a parallel computer) Algorithm Odd-even-merge-sort (a0,a1,…a2n-1) 1. Split the list a0,a1,…a2n-1 into two lists a0,a1,…an-1 and a0,an+1,…a2n-1 2. Odd-even-merge-sort (a0,a1,…an-1) 3. Odd-even-merge-sort (an,an+1,…a2n-1) 4. Odd-even-merge (a0,a1,…an-1, an,an+1,…a2n-1) Algorithm Odd-even-merge (a0,a1,…an-1, b0,b1,…bn-1) 1. c0,c1…cn-1 := Odd-even-merge (a0,a2,…an-2, b0,b2,…bn-2) 2. d0d1…dn-1 := Odd-even-merge (a1,a3,…an-1, b1,b3,…bn-1) 3. For all i > 0, compare c1 and di-1 and interchange i necessary 4. Return; c0 c1 d0 c2 d1 c3 d2 … cn-1 dn-2 dn-1 216 Example Odd-even-merge (4,5,8,11,20,25; 2,9,10,27,30,31) Odd-even-merge returns (4,8,20;2,10,30) 2,4,8,10,20,30 Odd-even-merge returns (5,11 ,25;9,27,31) 5,9,11,28,27,31 c: 2 4 8 10 20 30 d: 5 9 11 25 27 31 245 89 10 11 20 25 27 30 31 Sequential time complexity: Odd-even-merge – T1(n) T1(n) = 2T1(n/2) + cn . = O(ntogn) Odd-even-merge-sort – T2(n) T2(n) = 2T2(n/2) + c1nlogn . = O(nlog2n) In parallel, Odd-even-merge - O(logn) time 217 ….sort - O(log2n) Odd-even-mereg (a0a1a2a3, a4 a5 a6 a7) a0 a1 a2 a3 , a4 a5 a6 a7 a0 a2 a4 a6 , a1 a3 a5 a7 a0 a4 a2 a6 , a1 a5 a3 a7 b0 b1 b2 b3, b4 b5 b6 b7 c0 c1 c2 c3 , c4 c5 c6 c7 c0 c1 c4 c2 , c5 c3 c6 c7 218 d0 d1 d2 d3, d4 d5 d6 d7 Lower bound for sorting Defintion: Let B be a problem and f(n) a function. B requires Ω (f(n)) time if every algorithm for B has time complexity Ω (f(n)) (i.e., the running time is at least f(n) In the worst case for inputs of length n). f(n) is a time lower bound for B. Theorem : Sorting by comparisons requires Ω (nlogn) time. (in fact, Ω (nlogn) comparisons) Assumption: Only operations on keys are comparison of key values. Key 1 < key 2 ? no yes Without loss of generality, assume the keys are distinct. decision trees 219 Let P be any sorting algorithm. Denote the input by: A[1..n] : a1,a2,…,an Define a binary tree as follows A[i1,] < a[j1]? yes no A[i2,] < a[j2]? y A[i3,] < a[j3]? no y An outcome i.e. A sort list ar1,ar2,….,am 220 no called the decision tree for P on size n Decision tree for bubble sort with n =3 1 2 3 A For I : = 1 to 2 do For j :=3 down to I +1 do If A[j]< A[j-1] then swap (A[j],A[j-1]) A[1..3] = a b c Abc A[3] < a[2] ? y y Abc Abc A[2] < a[1] ? y A[2] < a[3] ? n y y Abc Abc Abc Abc A[3] < a[2] ? A[3] < a[2] ? A[3] < a[2] ? A[3] < a[2] ? y cba n cab n acb y bca 221 n bac n abc Fact: For any sorting algorithm A, the decision tree for a must have at least n! leaves. Proof There are ni outcomes when A sorts n elements. Fact The depth of the decision tree must be at least log Proof Let depth = d n! ≤ 2 d ≥[1og2(n!)] Corollary A requires at least 1og2(n!) comparisons in the worst case. . n! = (n/e) e = 2.71 83 1og2(n!) = n1og2(n/e) 222 n1og2(n/e) = Ω (nlogn)   Sorting requires c compansors. Sorting requires Ω (nlogn) time. Average Time complexity for sorting = avg depth of leaves in decision tree Claim: Among the n! leaves, at least half of them have depth n1og 2 (n!). Proof: The maximum number of leaves with depth ≤ n1og2(n!)-1 is n1og2(n!/2) 2 n1og2(n!/2) =2 = n!/2  on average, sorting requires Ω 1og2(n!) 2 = Ω(nlogn) time 223 Problem Given a1,a2,…,an, s.t. a1 < a2 <…< an and x Find i s.t. a1 =x Binary search: O(logn) time Fact searching requires Ω (logn) time Proof any of a1,a2,…,an could be x  there are at least n + 1 outcomes when we search x in a1,…,an a decision tree must have depth log2(n+1) Given a1,a2,…,an find the smallest element Problem ai < aj ai < aj ai < aj yes n-1 elements must all have lost some comparison! 224 no External sorting Assumption : The number of data items to be sorted is too large. The data items (records) are sorted on external storage devices in the form of (sequential) files. External storage device: Magnetic tape read/write head Operation: Read (B) Write (B) f.forward rewind ... BLOCK i BLOCK i +1 225 . .. inter-block gap Magnetic Disk A track R/W head A sector ( block) To access a sector: 1. locate the current track by shifting the R/W head 2. wait until the correct sector arrives the time needed to access a block: seek time + actual R/W time >>> main memory access time file : a sequence of block a fixed number of records 226  in external sorting, the dominating factor is the number of block accesses  it is desirable to scan a file beginning to end The model File1 file2 file3 CPU . . . . MAIN MEMORY . . . . . . . . C BLOCKs Objective: sorting with minimum number of passes through the file (thus, minimum number of block accesses) 227 Bubble, insertion, …., Quick, heap sorts: require at least O(n) passes. 2-way merge sort: only require [log2n] passes! ASSUME THAT FILES ARE STORED ON DISKS. THUS, SEEK TIME IS THE “SAME” FOR ALL BLOCKS. EXAMPLE Sort file F = A1,A2,…A2100 A block = 100 records Working main memory space = 3 blocks ( used as buffers) 228 Step 1: internally sort three blocks (300 records at time. Store the resulting file on disk. Run1 run2 1-300 Step 2: 301-600 run3 run4 run5 run6 run7 601-900 901-1200 1201-1500 1501-1800 1801-2100 partition the main memory into three blocks. Two are used as input buffers and the third is used as an output buffer. Merge runs 1 and 2 . Alogrithm Merge (R1,R2) Read a block from R1; Read a block from R2; Merge the records in the input buffers and store the result in the output buffer; If the output buffer gets full, write the contents on to the disk and clear the buffer; If an output buffer gets empty, read a block from the same run. 229 Merge runs 3 and 4, then 5 and 6 , then copy run 7 The result of this is a file of 4 runs. Merge these runs and produce a file of two runs. Merge the two runs to obtain a single run (i.e., a sorted file). F F1 F2 F3 230 Notes: 1. If the number of initial runs is m, then [log2m] passes suffice. 2. if the device is tape, then we need four tapes. Tape 1: Tape 2: Tape 3: Tape 4: Run 1 Run 3 Run 5 Run 7 Run 2 Run 4 Run 6 Run 1 Run 3 Run 2 Run 4 Run 1 Tape 1: Tape 2: Run 2 Run 1 Tape 3: Tape 4: 3. Temp files can be discarded after being used. 231 4. k-way merge 3 way Generally, K-way merge sort requires [logkm] passes = [log2m/log2k] k +1 buffers; more comparison (k-1/record) in each pass for tapes, k-way merge requires 2k tapes. 232 General algorithm design techniques Divide-and-conquer Top-down, recursive e.g. merge sort quick sort Dynamic programming Bottom –up longest common subsequence Greedy Brute force shortest paths minimum-cost spanning tree Back tracking rat-in-maze Divide-and-conquer To solve problem A: If A is small enough Then solve it directly Else Break A into smaller problems A1,A2,..Ak; Solve Ai for each i = 1, 2, …, k; Combine the solutions for A1,..Ak To obtain the solution for A 233 smaller instances Of the same problem Example: Towers of Hanoi A B C Algorithm Move (n,A,B) //move n disk from A to B // if n =1 then move the disk B else begin move (n-1, A, C); move (1,A, B); move (n-1, C, B) end T(n) = C1 2T(n-1) + c2 if n =1 otherwise 234 = O(2n) Example Given n integers, find both the maximum and minimum Algorithm maximum (A[1..n], max, min) If n =1 then Max := min := a[1] Else if n =2 then ifA[1] < A[2] then max : = A[2] min : = A[1] else max : = A[1] min := A[2] else // n > 2// maximin (A[1..n/2], max1, min1); maximin (A[n/2 +1..n], max2, min2); if max1 < max2 then max : = max2 else max : = max1 if min1 ≤ min2 then min : = min1 else min : = min2 235 C (n) = n 2C( 2 ) +2 (comparisons) 1 if n =2 C(n) = 3/2n -2 (by induction) Dynamic programming There are situations where: (i) There is no way to divide a problem into a small number of Subproblems. (ii) The subproblems overlap each other (too much redundancy if d divide-and-conquer is used). (iii) The total number of subproblems to tackle is not large, i.e. polynomial (i.e. nk. Usually k =2,3). 236 Dynamic programming approach: Systematically solve all the subproblems, with the smallest ones first. Keep track of the solutions to the solved subproblems by means of a Table. Solutions to larger subproblems are found by combining solutions to smaller subproblems. Example Longest common subsequence problem (LCS) sequence : x = a1a2…an subsequence of x: a sequence obtained from x by deleting some characters ab cab ca bba b LCS Problem: given x = a1a2…an y = b1b2…bm find the length of an LCS of x and y 237 Previous solution ( using sets); O(plogn) time Where p = the number of paris of positions, One from each sequence, that have The same character In the worst case, p = O(nm) Time : O(nmlogn) Dynamic programming solution Given x = a1a2…an and y = b1b2…bm Define an (n +1) X (m +1) matrix L L[i,j] = the length of an LCS of a1a2…ai and b1b2…bi For all 0 ≤ i ≤ n, 0 ≤ j ≤ m Note : L[0,j] = L[i,0] = 0 0 ≤ i ≤ n, 0 ≤ j ≤ m L[n,m] is the length of the LCS of x and y Each L[i,j] A subproblem 238 L[0,j] = 0, 0≤j≤m L[0,j] = 0, 0≤i≤n L[i-1,j] L[i-j-1] i-1,j -1] +1 if a1 = bj L[i,j] = max 0 otherwise 1 ≤ I ≤ n, 1 ≤ j ≤ m a1… ai-1 ai b1… bj-1 bj 0 1 j -1 j 239 0 0 0 0 0 0 0 0 0 1 i -1 i n solution Algorithm LCS // evaluate matrix L row by row , with row 0 first // 240 for j : = 0 to m do L[0,j] : = 0; For i := 1 to n do L[i,0] : = 0; For i := 1 to n do for j : = 0 to m do if ai =bj then temp : = L(i-1,j-1] +1 else temp :=0 L[i,j] : = max (L(i-1,j], L[i,j-1], temp End; Writeln (L[n,m]) Time = O(nm) Space = O(min(n,m)) Recursive solution (divide – and-conquer) Algorithm LCS (n,m) 241 If n = 0 or m = o then LCS : = 0 Else begin I1:=LCS (n-1,m); I2:=LCS(n,m-1); if an = bm then I3 : LCS (n-1, m-1) +1 Else I3: =0; LCS : = max (I1,I2,I3) End End; T(n,m) = T(n-1,m) + T(n,m-1) +T(n-1,m-1) +C = O(3n+m) Dynamic programming example 2 World Series Odds Problem: Teams A and B play a match. Whoever wins n games first wins the match. 242 Assumption: A and B are equally competent, i.e., each has a 50% chance of winning a particular game. P(i,j): The probability that if A needs i games to win (i.e., A has won n-i games) and B needs j games to win, that A will eventually win the game, 0 ≤ i, j ≤ n. We want to compute P(s,t) for some particular 0 ≤ s.t. ≤ n. P(0,j) = 1 1≤j≤n P(i,0) = 0 1≤i≤n P(i,j) = P(i-1, j) + P (I, j-1) 0 ≤ i, j ≤ n 243 2 0 j -1 j 0 0 0 0 0 1 0 0 i -1 0 1 1 1 1 Order of evaluation: 1. row by row/column by column 2. diagonal Greedy Algorithm Setting : given n objects a1,a2,…an, Each with a weight ( or cost) w(ai) 244 i 0 We want to select a subset of objects a11,a12,…akm, subject to some constraint, such that m ∑ w (aij) j=1 is the minimum Example Coin Changing A1 = {c1,c2,…cn} is a set of distinct coin types c1 > c2 > cn ≥ 1 How do we make up an exact amount using a minimum Total number of coins? If cn =1 then greedy algorithms can be used. Algorithm coinchange (x); i = 1; while x ≠ 0 do begin if c1 ≤ x then begin //selsect the largest coin whose value is ≤ x // writeln(c1); 245 x : = x –c1 else i:= i +1 end e.g. c1 = 25¢ c2 = 10¢ c3 =5¢ c4 = 1¢ x =73¢ change : c1, c1, c2, c2, c4, c4, c4 Notes: 1. The algorithm doesn’t necessarily generate change with minimum total number of coins. e.g., c1 = 5, c2 = 4, c3 = 1 x =8 2. Does so if A1 = { kn-1, kn-2,…k0 } Matching. 246 6 1 7 2 8 3 4 9 5 10 1. Start with m 0 2. Find an augmenting path P relative to M and replace m by MP 3. Repeat (2) until no further augmenting path exists, and then M is maximal. 1 2 3 4 5 6 7 8 9 10 247 P = {1,6} M = 0  {1,6} = {1,6} 2 3 7 8 4 5 9 10 6 1 7 P = {(2,6), (1,6),(1,7)} M = {1,6}  P = {(2,6), (1,7)} 3 4 6 7 5 248 P = {(3,7), (1,7),(1,6),(6,2),(2,9)} M = {(2,6),(1,7)}  P = = {(3,7),(1,6),(2,9)} 4 5 6 8 9 10 249 1 2 3 3 1 7 3 3 Doesn’t not work 3 P = {(4,9), (2,9), (1,6),(1,8)} M = {(3,7),(1,6),(2,9)}  P = {(4,9),(3,2),(1,8)} 2 6 5 9 10 250 P = {(2,9),(4,9),(4,10)} M = {(4,9), (1,8),(3,7)}  P = = {(2,9),(1,8),(4,10),(3,7)} 5 6 251 P = {(5,6)} M = {(2,9),(1,8), (4,10),(3,7)}  P= = {(2,9),(1,8),(4,10),(3,7),(5,6)} 252

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture Notes - McMaster Computing and Software