Download Lecture Notes - McMaster Computing and Software

Document related concepts

Array data structure wikipedia , lookup

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

B-tree wikipedia , lookup

Quadtree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
THE PROBLEM SOLVING PROCESS
ABSTRACT
DATA TYPES
MATHEMATICAL
MODEL
INFORMAL
ALGORITHM
PSEUDO – LANGUAGE
PROGRAM
OR
OTHER FORMAL
DESCRIPTION
1
DATA
STRUCTURES
PROGRAM
(Pascal, C, C++, ete.)
DATA TYPE VERSUS ABSTRUCT DATA TYPE
DATA TYPE: Set of values (or objects)
ABSTRUCT DATA TYPE (ADT):
Set of objects + a
mathematical model with a collection of operations defined on the
model.
2
DATA TYPE: Set of values (or objects).
Fortran 77:
LOGICAL
INTEGER, REAL, CHARACTER/STRING
Composite types: Array of Integers
Array of reals
Etc.
Pascal:
Basic types: integer, real, character, Boolean
Composite types:
array of integers
Array of characters
Etc.
Record of integers/reals/characters
Etc.
Set of….
File of…
 THERE ARE OPERATIONS ASSOCIATED WITH EACH
TYPE
 AGGREGATING TOOLS: array, record, file
3
C:
Basic types: int, real, char
Composite types: arrays, structures
WHAT ABOUT POINTERS?
Pointer can be treated as a data type, but usually it’s treated
as a DATA STRUCTURING FACILTY.
4
ABSTRUCT DATA TYPE (ADT)
Set of objects plus a mathematical model with a collection of
operations defined on the model
Example: List a1, a2 ,…, an
LIST (of integers) is an ADT with the following operations:
1.
Calculate the length of the list
2.
Get the fist member of the list and return null if empty
3.
Retrieve the member at position P and return null if P
doesn’t exist
4.
Locate X in the list
5.
Insert X into the list at position P
6.
Delete the member at position P
5
P = 1
L = 50,
2
60,
3
4
5
6
7
8
23, 47, 21, 39, 60, 40
1.
LENGTH (L) = 8
2.
FIRST (L) = 50
3.
RETRIEVE (4, L) = 47
RETRIEVE (9, L) = null
4.
LOCATE (60, L) = 2
5.
INSERT (30, 5, L) gives the result:
L = 50, 60, 23, 47, 30, 21, 39, 60, 40
6.
DELETE (3, L) gives the result
L = 50, 60, 47, 21, 29, 39, 60, 40
ALL OPERATIONS ARE ATOMIC EXCEPT FIRST SINCE:
FIRST (L) = RETRIEVE (1, L)
6
EXAMPLE:
1.
2.
3.
4.
ADT STACK (OF INTEGERS)
Retrieve the top element
Delete the top element (POP)
Insert x at the top (PUSH)
Test if the stack is empty

S=
27

40
32
1. TOP (S) = 27
2. POP(S) = results in
 40
S=
32

3. PUSH (0, S) results in
S=
3. EMPTY (S) = false
7
0
27
40
32
Example: ADT MATRIX (OF REALS)
1. Return number of rows
2. Return number of columns
3. Multiply matrices A and B
4. Add A and B
5. Compute the transpose of matrix A
6. Delete a rows/column
7. Add a row/column
8. Multiply matrix A by real number 6
OBSERVATIONS:
1. Domain of an operation may involve more than one ADT
Type
2. Some operations are partial
3. Range of an operation may be a different ADT
8
A simple application of ADT – evaluation of arithmetic express
a + b*c/d **e +f
Algorithm: Value (x : expression);
oprnd: STACK OF REALS
optor: STACK OF CHARS
x1, x2 : REAL
i: INTEGER
Initialize oprnd and optor
for i:= to LEN (x) do
case x[i] of
real: PUSH (x[i], oprnd)
char: if TOP (optor) < x[i] then
PUSH (x[i], optor)
else
repeat
x2: = TOP (oprnd);
POP (oprnd);
x1: = TOP (oprnd);
POP (oprnd);
x1: = x1 TOP (optor) x2; PUSH (x1, oprnd);
POP (optor)
until TOP (optor) < x[i];
PUSH (x[i], optor)
end if
endcase
Value: = top (oprnd)
9
Comparison of ADT’s with procedure – the advantages
1.
GENERALIZATION
Procedures are generalization of primitive operations
(e.g. +, -, *,….)
ADT’s are generalizations of primitive data types.
2.
ENCAPSULATION (OR MODULARITY)
A procedure encapsulates all the statements relevant to a
certain aspect of a program.
An ADT encapsulates all the definitions and the
operations relevant to a data type.
How to implement an ADT?
Note that a data structure doesn’t have to be associated with an
ADT.
10
Data Structure: A collection of data objects connected in various
ways.
A data structure is always associated with a specific programming
language.
FORTRAN 77: the only data structuring facility is ARRAY
PASCAL: we have: ARRAY, RECORD, and POINTER
C: ARRAY, STRUCTURE, POINTER
Some important terms:
Cell:
a box capable of holding a value drawn from some basic
or composite data types (e.g. integer, record…)
CELL IS BASIC BUILDING BLOCK OF DATA STRUCTURE
Pointer: a cell whose value indicates another cell
Cursor: an integer-valued cell, used as a pointer to an array
11
Example: A simple data structure is given below.
It may be used in the implementation of ADT MATRIX.
a11
a11
a11
a22
a22
a22
am1

am1

am1
A pointer
12




A CELL
Type
Cell type = record
Element: real
Down:
cell type
Right:
cell type
End
13
Example: A data structure below way be used in the
implementation of ADT LIST
1.2
3
-1 ≡ uil pointer
1
cursor
0 ≡ uil pointer
3.4
0
2
5.6
2
3
7.8
1
4
A CURSOR
L = 7.8, 1.2, 5.6, 3.4
4

2
Type
Record type = Record
Cursor = integer;
Ptr:
end
1.2
3
3.4
0
2
5.6
2
3
7.8
1
4
1
14
Record type
ALGORITHM VERSUS PROGRAM
 An algorithm is a finite sequence of instructions satisfying
the following criteria.
1)
Definiteness : - each instructions must be clear and
unambiguous
2)
Finiteness: - the algorithm will terminate after a
finite number of steps for all cases.
3)
Effectiveness:- each instruction can be performed
using a finite amount of resource (time and space)
 A (well-defined) program in principle is similarly described,
but program:
1) is always associated with a specific programming
Language
2) May not half (e.g. Operating systems)
All the programs. We are interested in; half pseudo-Pascal is our
chosen language
WE
WILL
USE
INTERCHANGABLE
ALGORITHM
15
AND
PROGRAM
Examples:
Proc search (x: integer; A : array [1…10] of integer)
i=1;
while x <> A[i] and I <= 10 do i:= i + 1;
search: = i
end
proc print ( S: set)
Print the elements of S
end
proc Pi
print all the digits of Pi } never ends
end
16
The running time of a program depends on
1)
2)
3)
4)
Computer speed
Compiler quality
Input to program
Program efficiency (or quality)
The TIME COMPLEXITY of a program is defined as a function
of input, usually the SIZE of input.
17
Program A is of worst case time complexity T(n) if the maximum
running time of A on any input of size n is T(n).
THE UNITS OF T (n) ARE UNSPECIFED.
Although the constants in T (n) are important, we are more
interested in the growth rate (or order) of T (n).
e.g.
2n ≈ 10n + 1000
2n << n2 when n is large
f (n) << g (n) ↔
lim f (n)
n∞ g (n)
18
IMPORTANT DEFINITION
T (n) is 0 (f(n)) if there are constants C and n0 such that
T (n) ≤ C f (n) when n ≥ n0
Note : 0 (f(n)) actually denotes a class of functions of the same or
slower growth rates, and it would be better to write
T (n) Є 0 (f (n))
No
19
Stands for “is”, not “equals”
Examples:
3n2 + 16n + 8 = 0 (n2)
C=4
n > 17 
n0 = 17
3n2 + 16n + 8 ≤ 4 n2
------------------T (n)
f (n)
n logn = 0 (n2)
n>0
 n logn < n2, jo C = 1, n = 0
3n3 – 6n2 ≠ 0 (n2)
V C Э n0
n > n0  3n3 – 6n2 > Cn2
If n > 0 THEN 3n3 -6n2 >Cn2  3n – 6 > C
Hence if n > C + 6 then 3n3 – 6n2 > Cn2
3
20
k
Σ ai n i
= 0 (nk)
when ak > 0
i= 0
106 = 0 (1) = 0 (2)
100n + 105 = 0 (n)
n4 + n2 + n + 6 = 0(n4)
2n + n100 = 0(2n)
3n >> 0 (2n)
log10 n = 0 (log2n) since log10n = log2n
log210
0 (f(n)) is an upper bound of the at the growth rate order of T (n) if
T (n) = 0 (f(n))
21
To specify a lower bound, we use Ω.
DEFINITION:
T (n) is Ω (f(n)) if there is a constant C such that T (n) ≥ c f(n)
infinitely of ten.
½ n + 100 =
Ω (n)
T (n)
F (n)
 ½ n + 100 > C n
C=½
T (n) = n
n2 /100
n is odd & n ≥ 1
n is odd & n ≥ 1
T (n) = Ω (n2)
C = 1/100  T (n) ≥ C n2 for n = 0, 2, 4, 6,
22
WHY IT IS IMPORTANT?
5n2
2n
n3/2
100n
3000
2000
1000
5
10
15
20
Running times of 4 programs
1000 jek ≈ 17 minutes
23
n
 HOW LARGE A PROBLEM CAN WE SOLVE
 SUPPOSE THAT WE NOW WE BUY A MACHINE THAT
RUN 10 TIMES FASTER AT NO ADDITIONAL COST.
THEN FOR THE SAME COST WE CAN SPNED 104
SECONDS ON A PROBLEM WHRE WE SPENT 103 SEC
BEFORE
Running time
T (n)
Max Problem size
for 103 sec
Max problem
size for 104 sec
Increase in
Max problem
size
100
10
100
1000%
5n2
14
45
320%
n 3/2
12
271
230%
2n
10
13
130%
THE 0 (2n) PROGRAMS CAN SLOVE ONLY SMALL
PROGRAMS NO MATTER HOW FAST THE UNDERLYING
COMPUTER IS.
24
THEOREM
IF T1 (n) = 0 (f(n)) AND T2 (n) = 0 (g(n))
THEN
T1 (n) + T2 (n) = 0 (max (f (n)), g (n)).
PROOF
THERE ARE c1, n1, c2, n2 SUCH THAT
n ≥ n1  T1 (n) ≤ c1 f (n)
n ≥ n2  T2 (n) ≤ c2 g (n)
LET n3 = max (n1,n2). THEN
n ≥ n3  T1 (n) + T2 (n) ≤ c1 f (n) + c2 g (n) ≤ (c1 + c2) max (f(n), g
(n)).
ENDPROOF
HENCE: 0 ( f (n)) + 0 (g (n)) = 0 (max (f (n),g (n)))
0 ( n2) + 0 (n3) = 0 (n3)
0 (n2) + 0 (2n2) = 0 (2n2) = 0 (n2)
25
THEOREM
IF T1 (n) = 0 (f (n)) AND T2 (n) = 0 (g (n))
THEN
T1 (n) T2 (n) = 0 (f (n) g(n))
PROOF
THERE ARE c1, n1, c2, n2 SUCH THAT
n ≥ n1  T1 (n) ≤ c1 f (n)
n ≥ n2  T2 (n) ≤ c2 g (n)
LET n3 = max (n1,n2). THEN
n ≥ n3  T1 (n) T2 (n) ≤ c1 c2 f(n) g (n)
ENDPROOF
HENCE: 0 ( f (n)) 0 (g (n)) = 0 (f (n) g (n))
0 ( n2) 0 (n5) = 0 (n7)
0 (n2) 0 (2nh) = 0 (n22h) = 0 (2h+2logh)
OTHER IMPLICATIONS
f(n)
f (n)
Σ
0 (g (i, n)) = 0 ( Σ
i=1
i=1
g (i,n))
max (0 (f(n)), 0(g (n)) = 0 (max(f (n)), g(n)))
26
0 (f(n)) = 0 (g (n))
* ASYMMETRIC!
↨
0 (f (n)) ≤ 0 ( g(n))
 ═
Means
IS
MY CAT IS BLACK ≠ BLACK IS MY CAT
0
: FUNCTION → SET OF FUNCTIONS
N2
n2
2
1000 n + 5
0
DEF: 0 (f (n)
0 (f (n) == 0 (f (n)
Any Operator
+, ., ETC.
27
g (n))
0 (n2)
OTHER USEFUL RULES
f(n) ═ 0 (f (n))
C 0(f(n)) ═ 0 (f(n))
0 (f(n)) + 0 (f(n)) ═ 0(f(n))
0 (0 (f(n)) ═ 0 (f(n))
0 (f(n))0(g(n)) ═ 0 (f(n))g(n))
0 (f(n)g(n)) ═ f(n)0(g(n))
REMEMBER:
═
HERE IS ASYMMERIC!
28
CALCULATING COMPLEXITIES OF ALGORITHMS
Procedure : bubble (var A : array [1…..n] of int);
BUBBLE SORT A INTO INCREASING ORDER
Var i, j, temp: interger:
Begin
1 for i: = 1 to n -1 do
2
for j:=n down to it 1 do
3
if A[j-1] > A[j] then begin
4
temp: = A[j-1];
5
A[j-1] := A[j];
swap
A[j-1] and A[j]
6
A[j] := temp
end
(3) – (6)
TAKES 0(1)
(2) – (6)
TAKES (n -1) 0(1) + 0(1) = 0(n-1)
(1) – (6)
TAKES:
n
Σ
i=1
n
[0(n-i) + 0(1) = 0 ﴾Σ (n-i) = 0 (n(n-1))
i=1
29
2
═
0(n2-n)
═
0 (n2)
2
Function test (m: integer) : Boolean;
TESTS IF m IS A POWER OF 2, I.E. M═2K FOR SOME k
Begin
if m =1 then test:= true
else
if (m mod 2 = 0) then test:= test (m/2)
else
test:=false
end
LET T(m) = time complexity of test
1→C1
2 →C2
3→C1
4→C2 + T(m/2)
5→C2
T (m) =
C1 +C2
2c1 + c2
m =1
m odd, m >1
2C1 + C2 + T (m/2)
m even
A recurrence equation
30
Define a new function:
C 1 + C2
m ≤1
T’(m) =
2C1 + C2 + T’ (m/2) m > 1
Then
T(m) ≤ T ‘(m) for all the m > 0 i.e.
T ‘(m) is an upper bound of T (m).
Note:
T ‘(m) is defined for all real numbers.
T ‘(m)
=
2C1 + C2 + T’ (m/2)
=
2(C1 + C2 )+ T’ (m/22)
=
3(2C1 + C2 ) + T’(m/23 )
…
= [log2m] (2C1 + C2) + T’
= ( 2C1 + C2 ) [log2m] + C1 + C2
= 0(logm)
THUS: T’(m) = 0 (logm), T(m) = 0 (logm)
31
m
2[log2m]
Worst case
occurs when
M= 2k
Ceiliuy :
is the smallest integer ≥ x
[x]
→
e.g.
[1.5] = 2
[3.1] = 4
[3.0] = 3
NOTE THAT:
2[log2m] ≥ m
IF
m = 2k
[log2m] = k
log2m = k,
and 2[log2m] = m
m=6
log24 = 2 & log2 8 = 3 → 2 <log26 < 3 → [log26] =3
>m
32
→ 2[log2m]
Problem:
What is T (m) ? Is m the length of input!
M is 100, 15, 64, etc, just number!!
 IF m is BINARY and is the number of bits of m, THEN
n = [log2m]
And
i.e.
T (m) = 0 (log2m) → T (n) = T ([log2m]) = 0 (log2m)
T (n) = 0 (n)
 M CAN BE TREATED AS : 000….0,
m
I.E. m UNITS, THEN
THEN “LENGTH” OF M IS m, and
T (n) = T (m) = 0 (log n)
33
DESIGN OF A PROGRAM
 TOP – DOWN / BOTTOM – UP APPROACH
 STEPWISE REFINEMENT, COOSE ADT’S AND DATA
STRUCTURES
 CODING
A REMARK ABOUT RUNNING TIME
 ALTHOUGH THE ORDER OF RUNNING TIME IS VERY
IMPORTANT, WE SHOULD ALSO CONSIDER THE
FOLLOWING FACTORS IN PRACTIC.
1.
THE TIME IT TAKES TO WRITE AND DEBUG THE
PROGRAM
2.
READABILITY, MODULARITY, ETC. HOW HARD
IS TO MAINTAIN THE PROGRAM
3.
SOMETIMES CONSTANTS ARE ALSO IMPORTANT
4.
SPACE (OR STORAGE) COMLEXITY
5.
ACURACY
34
ADT LIST
A list is a sequence of zero or more elements of a given type
(element type).
L = a1, a2, a3, …., an
length = n
first = a1
last = a1
ai is at postion i
ai1 precedes ai
ai followa ai1
END(L) = position n+1
some data type
Operations
INSERT (x,p,l); DELETE(p,L);
LOCATE (x,L); RETRIEVE(p,L);
MAKENULL(L): L ← Є
FIRST(L); NEXT(p,L); PREVIOOUS(p,L)
PRINT(L); LENGETH(L); REVERSE(L)
CONCAT(L1, L2); etc. EMPTY(L)
35
Array implementation of lists
Last
1
a1
2
a2
list
an
empty
max
Const
max =?;
type
position = 1..max;
LIST = record
elements: array [positions] of elements type:
last: o ..max
end;
function
END (var L: LIST) : integer;
begin
END : = L.last+1
end;
36
last
1
2
3
p
a1 a2 a3
ap
max
an-1
an
Procedures INSERT (x: elements type; p: position; var L:LIST);
Var
q: position
begin
if L.last = max then
error (‘list is full)
else
if (p>L.last+1) or (p>1) then
error
else begin
for q:=L.list downto p do
L.elements[q+1]:=L.elements[q];
//shifting to the right//
L.last:=L.last+1;
L.elements[p]:=x
End
End;
Time co,plexity: INSERT, DELETE, LOCATE – O(n)
RETIEVE, NEXT, PREVIOUS
END, FIRST, MAKENULL – O(1)
Avg. time
INSERT, DELETE, LOCATE – O(n)
37
Pointer implementation (linked list)
Cell 0
cell 1
cell 2
a1
a2
..
Header
list
L
Type
Celltype = record
Element : elementtype;
Next: ↑ ceeltype
End;
LIST = ↑ celltype;
Position = ↑ celltype;
Position i : a pointer to cell i -1, 1≤ i ≤n+1
Function END(L.LIST) : position
Var
q: position
begin
q:=L;
while q. ↑. Next < > nil do
q := q. ↑. Next;
END:=q
End;
38
cell n
an
.
LIST: record
first: ↑ celltype;
last: ↑ celltype
end;
Insert x at p
……
time O(1)
a
…..
b
p
Delete cell at p
Time O(1)
…
a
c
b
….
p
Time O(n)
L
Header
p
PREVIOUS (p, L)
39
INSERT, DELETE, RETRIVE, NEXT, FIRST, MAKENULL –
O(1)
PREVIOUS, LOCATE, END – O(n)
Compare the two implementations
1.
maximum size of the list – array
2.
waste of space – both
3.
operation speeds
array
INSERT
DELETE
PREVIOUS
END
4.
O(n)
O(n)
O(1)
O(1)
pointer
O(1)
O(1)
O(n)
O(1) or O(n)
pointer representation can be dangerous!
e.g. q:=NEXT(p,L);
INSERT (X,P,L)
.
.
.
IF q=NEXT(p,L) then
40
P
q
DOUBLY – LINK – LISTS
Cell 1
cell 2
a1
q≠NEXT(p,L)!
cell3
a2
a3
Type
Cell type = record
Element: elementtype;
Next, previous:  celltype
End;
Position:  celltype;
Position I: a pointer to cell I
Function
LAST (L)
Begin
LAST : = L.previous
End;
WHAT HAPPENS IF POINTERS AREN’T AVAILABLE?
USE CURSOR!
41
cell n
an
PATTERN MATCHING IN STRINGS
A = {a1, a2, …. , ak }
ALPHABET
SYMBOL/CHARACTER
A STRING
x = a1, a2, …. , an
n  0 , ai  A
STRINGD A SPECIAL CAST OF LISTS
PATTERN MACTHING:
x = a1, a2, …. , an
Pat = b1, b2, …. , bm
Is pat a substring of x?
i.e.
( I : 1  I  n – m +1)
ai ai+1 …. = b1b2 …bm
x = aabbbabbbaaa
pat = bab
1234567891011
x = aabbabbbaaa
pat = bab
yes i= 4
pat = abab => No
42
SIMPLE ALGORITHM
x = aabbabbbaaa
pat = bab
aabbabbbaaa
bab
NO
aabbabbbaaa
bab
NO
aabbabbbaaa
bab
YES!
BUT FOR pat = aaa WE NEED TO MOVE
FROM aabbabbbaaa TO aabbabbbaaa
aaa
aaa
SIMILARLY for pat = abab, from
1234567891011
aabbabbbaaa
abab
TO
aabbabbbaaa
abab
8 = 11 – 4 +1
43
WORST CASE
x = a1, a2, …., am am+1 …..an-m+1 …. an
b1 b2…bm
b 1 b 2… b m
n-m+1 passes
EACH PASS TAKES O(m) comparisons, HENCE
(n-m+1) O(m) = O(m(n-m+1)) = O (mn)
procedure find (x, pat : STRING; var found : Boolean; var i :
position)
Found is set to false if pat doesn’t occur int x, otherwise found is
set to take and I is set to the first position in x where pat begins)
Var p,q : position;
Begin
If not EMPTY (x) and not EMPTY (pat) then
Begin
Found: = false;
i:= FIRST(x);
while not found and i  END (x) do
begin
p:= i; q: = FORST(pat);
while RETRIEVE (p,x) = RETRIEVE (q, pat) and not found do
44
begin
p: = NEXT(p,x);
q:= NEXT(q,pat)
IF q= END (pat) then found : = true
End;
If not found then i:=NEXT(i,x)
End;
IF END(L) IS O(1) THEN T(n,M) = O(MN)
END
END
THE KNUTH, MORRIS PRATT ALGORITHM ( KMP)
X = abaababaabacabaababaabaab
MISMATCH
Pat= abaababaabaab
45
WHAT DO YOU DO NEXT?
X=
u
w
u
c
Pat =
u
w
u
a
NEXT MOVE
X=
u
w
u
w
u
c
u
a
Start comparison
X= abaababaabacabaababaabcab
abaababaabaab
math
u
abaababaaba
u
abaababaabacabaababaabaab
abaaba baabaab
start comparing & mismatch
46
abaaba
abaababaabacabaababaabaab
abaababaabaab
START COMPARING & MISMATCH
U
aba
u
abaababaabacabaababaabaab
abaababaabaab
abaababaabacabaababaabaab
↕
abaababaabaab
↑
mismatch
start comparing & mismatch
abaababaabacabaababaabaab
abaababaabaab
COMPARING
NUMBER OF COMPARISON IS O (n) BUT HOW TO FIND
OUT WHAT IS U?
47
LET pat = b1b2 … bm
OR EACH 1 ≤ j ≤ m, LET
Largest i sud that 0 < i < j and
b1… bi = bj –i+1 … bj
f (j) =
0 if sud i does not exist
f (j) < j FAILURE FUNCTION
j
1
Pat = a
f(j) 0
2
b
0
3
a
1
4
a
1
6
b
2
7
a
3
8
b
2
aba
abaa
abaab
abaabab
abaababa
abaababaab
abaababaaba
abaababaabaa
abaababaabaab
9
a
3
abaaba
abaababaa
48
10 11 12 13 14
a
b a
a
B
4 5 6 4 5
TIME COMPLEXITY:
T (n, m) = O (n + complexity of defining g)
= O (n + complexity of defining f)
0
f(j) =
if j = 1
fs(j -1) +1
where s is the smallest I such
that bfi(j-1) +1 = bj
0
if no such i exist
f i (j -1) = f (f(… f(j -1)…..)
i times
49
f 3 (j -1) = f (f (f (j-1)))
T(j-1) T1
u
j-1
a
u
a
j
f(j- 1)
HERE f(j) = f (j -1) +1
j -1
U
b u
a
i= f(j -1)
j
u
j -1
a
b
a
w
w
j
f(i) = f (f(i-1)) = f2 (i-1)
50
f2(i-1) +1
proc fail (pat[1…m], vav f: away [1…m] of integer )
vav i, j = integer;
begin
f[1] : = 0;
for j: = 2 to m do
begin
i:= f [j -1];
while (pat[j] ≠ pa[i+1} and i > 0) do i:= f[i];
if pat[j] = pat[i+1] then f[j]:= i+1
else f[j]:=0
and
end
T (m) = O (m) !
51
Procedure KMP (x, pat, g, found, i);
{ x[1..n] , pat [1..m] are strings ;
g[j] = g (j), 1 ≤ j ≤ m}
var p, q : position
begin
if n ≠ 0 and m ≠ 0 then
begin
p:=1; q:=1;
while (p ≠ n+1 and q ≠ m+1) do
if x[p] = pat [9] then
begin p: = p+1;
q: = q+1;
end else if q =1 then p:= p+1
else q: = g[q];
Time = 0(m)
If q = m+1
then begin found : = true; i: = p-m
end else found :=false
end
else ……
end ;
52
ADT STACK
“ LAST-IN-FIRST – OUT” LIST (LIFD)
OPERATIONS:
MAKENULL(S) : make stack S empty
TOP(s) : Return the top element of s RETRIEVE (FIRST, S)
TOP (s) = RETRIEVE (I, S)
POP(S) : Delete the top element of S
Sometimes POP is defined as function that returns the
element being popped out DELETE (FIRST (s) , S)
POP (s) = DELETE (I, S)
PUSH (X,S); insert x at the top of S
PUSH (x, s) = INSERT (x, I , s)
INSERT (x, FIRST(S), S)
EMPTY (s) : Return true if S is empty, false otherwise
53
A SIMPLE EXAMPLE:
F: erase characters, if cancels the previous uncancelled character
@:
kill character, if cancels all previous characters on the line
abc # d @ aa#b = ab
Procedure EDIT
Var S : STACK OF CHAR;
C: CHAR
Begin
MAKENULL (S);
Read (c);
While not end ( c ) do
Begin
If c = ‘#’ then POP (s)
Else
if c = ‘@’ then MAKENULL(S)
Else PUSH (c, S);
Read (c)
End;
PRINT S IN REVERSE ORDER
End
54
ARRAY IMPLEMENTATION OF STACKS
TOP
k
1
1
2
force
K
K
1ST ELEMENT
2ND ELEMENT
max
MAX
LAST ELEMENT
type : position = 1 … max;
STACK = record
Top : 1 .. max +1;
Elements : away [position] of element type
 PUSH, POP, TOP – O ( 1)
55
stack
MORE SPACE – EFFICIENT IMPLEMENTATION
POINTER IMPLEMENTAION
Stack
a
b
c
.
 MANY STACKS IN ONE ARRAY
TOP
1
2
STACK 1
3
STACK 2
BOTTOM
1
2
3
STACK 3
Stack pace
56
tree
ADT QUEUE
A QUEUE IS A “First – in – First – Out” LIST > (FIFO)
OPERATIONS:
MAKENULL (Q);
FRONT (Q) : return the first element of Q
FRONT (Q) = retrieve (first (Q), Q)
ENQUEUE (x, Q) : inserts x at the end of Q
ENQUEUE (X, Q) = INSERT (X, END (Q), Q)
DEQUEUE (Q): DELETES THE FIRST ELEMENT OF Q
DEQUEUE (Q) = DELETE (FIRST (Q),Q)
EMPTY (Q):
57
POINTER IMPLEMENTATION
header
a1
a2
…
front
near
type celltype = record
element : elementtype;
next : ↑ celltype
end;
QUEUE = record
Front, rear : ↑ celltype
End;
FUNCTION EMPTY (Q : QUEUE) : Boolean;
Begin
If Q. front = Q.rear then EMPTY: = true
Else EMPTY : = false
End
58
an
EACH OPERATION – 0 (1)
ARRAY IMPLEMANTATION
FRONT
TREE
1
1ST ELEMENT
2ND ELELMENT
QUEUE
REAR
LAST ELELEMT
MAX
TREE
59
ENQUEUE – 0 (n)
CIRCULAR ARRAY IMPLEMENTATION
(BUFFER!)
Max -1
max
an
real
1
….
.
a2
.
.
a1
A1
60
2
HOW DO WE DISTRINGUSH BETWEEN FULL AND EMPTY
 MAINTAIN AN EXTRA BIT
 FULL ≡ (FRONT = addone(addone(real)))
 Mark [i, j] =
1
if have been to (i, j)
0
otherwise
 IF NO WAY OUT, BEACK UP ONE CELL AND TRY
A DIFFERENT MOVE
 MUST STRORE THE CURRENT PATH SOMEWHERE
A PATH:
(i, j), (i2, j2), …., (is, js)
61
STACK
(is, js)
(is-1, js-1)
.
.
.
.
(iz, jz)
(i, j)
NW
N
(i-j, j -1)
NE
(i-1, j)
(i-1, j+1)
W
(j, j-1)
WS
(i + 1, j -1)
(i,j)
(i, j + 1)
(i+1, j)
(i+1, j+1)
Type offsets = record
X: -1 …1
Y: -1 …1
End
62
E
SE
Directions = (N, NE, E, SE, S, SW W, NW);
Var move : away [directions] of offsets
d
Move[d] .x Move[d].y
N
NE
E
SE
S
SW
W
NW
-1
-1
0
1
1
1
0
-1
Var
0
1
1
1
0
-1
-1
-1
maze : array [0 : m+1, 0 … n + 1] of 0 …1
Mark : array [1..m, 1…n] of 0…1
63
Type: dir = (N, NE, E, SE, S, SW,W,NW,D)
Type: elementtype = recond
X: 1 … m;
Y: 1 … n;
Start: dir
End
STACK = ……..
Var path : STACK
Fuy NEXTMOVE (loc : elementtype) : dir
Var d: dir;
S, r, i, j: out
Found : bool;
begin
i = loc.x; j := loc.y; d:=loc.start; found:= false;
while d# D ∩ not found do
Begin
s: = move [d].x; r: = move[d].y;
if maze [i + s, j+r] = 0 and
mark [ i + s, j +r] = 0
then found : = true else d:= Succ (d)
end; NEXTMOVE : = d
64
DEAD END
end
proc
var
rat ( var: maze [0 … m+1, 0 … n+1] of 0…1);
mark: array [1 …m, 1…n] of 0…1;
path : STACK;
location : elementtype;
d: dir
function NEXTMOVE (X: elementtype): dir; ….;
begin mark: = (0); MAKENULL (path);
initialzation
(should be specified last)
location:= (1,1, E) ; mark [1,1] : = 1;
PUSH (location, path );
While not EMPTY (pathy) do
begin
location: = TOP (path) ; POP (path);
d: = NEXTMOVIE (Location)
if d = D then
begin
location.start: = succ(d); PUSH (location, path);
location.x : = location.x + move [d].x;
location.y : location.y + move [d].y;
if location.x = m and location.y = n then
begin print(path); return end
else begin
PUSH ((location.x, location.y, N), path);
Mar(location.x, location.y) =1
End
End
End
65
end
end
TIME COMPLEXITY OF RAT : O (mn)
SPACE COMPLEXITY OF RAT :O (mn)
BUT WITHOUT SING MARK
O (8mn) = O (2mn)
An application of queues - breadth-first search in trees
tree T
V1
V2
a1
a2
V3
a3
V4
V5
a4
V6
a5
a6
V9
V7
a7
V8
a8
V10
a9
66
a10
V11
a11
binary
LEFT (v),
║
left child
of V
RIGHT(v)
║
right child
of V
e.g., LEFT (v3 = null, RIGHT (V3) = V6
DATA(V1) = a1
ROOT(T) = v1
Searching in tree
Given tree T and data x,
find a node v of T s.t. DATA(v) = X.
Possible approaches:
1. Breadth-first search
try level 1, then level 2, then
level 3, ...etc.
2. Depth-first search:
search along the leftmost path until the leaf is reached, then backup, try the 2nd leftmost path, ...etc.
67
Breadth-first
X = 20
V1
10
50
V2
V3
5
V4
V5
2
V6
60
20
V12
V10
V7
4
V8
2
V9
V11
5
20
Searching v1, v2, v3, v4, v5, v6, . . .
Depth-first
Searching order: v1, v2, v3, v4, v5, v6, . .
68
7
30
Procedure
DSearch(x,T)
begin
if
x = DATA(ROOT(T))’ then
PrintROOT(T)
Else
DSearch left subtree;
DSearch right subtree;
end;
nonrècursive version
procedure
var
DSearch (x,T);
path : STACK of nodes
v: node;
begin
v := ROOT(T); MAKENULL (path);
PUSH(v,path);
while not empty (path) do begin
v := TOP(path); pop(path);
if DATA(v) = X then Print v
elsel PUSH (LEFT(v));
PUSH(RIGHT(v)); e
end
end
Time: 0(n)
Space: 0(n)
Space avg: 0(Iogn)
69
// swap//
Procedur
BSearch(x,T)
Var
level : QUEUE of nodes;
V : node;
begin
v := ROOT(T);
MAKENULL(level);
ENQUEUE(vjevel);
while not empty (level);
begin
V := FRONT(level);
DEQUEUE(Ievel);
if DATA(V) = x then
Print v; stop
else begin
ENQUEUE(LEFT(v), level);
ENQUEUE(RIGHT(v), level)
end
end
end;
Time:O(n)
n =/T/
--------------size of T
Space: 0(n)
Avg :0(n)
70
Application – implement a DOS command cd:\
Cd:\ name – change current directory to subdirectory name
A:
job
letters
study
WP 5.0
letters
project
homework
letters
What should cd:\letters do?
BFS
DFS?
When do we use DFS?
e.g., solution tree
71
Proc. A(x1, x2,....)
Var y1, y2, …
Begin
.
.
A(a1, a2, …)
L1
….
….
Proc. A(x1, x2,....)
Var y1, y2, …
begin
.
.
.
.
.
Proc. A(x1, x2,....)
Var y1, y2, …
begin
.
.
.
.
.
A(b1, b2, …)
L3
.
.
.
.
.
.
B(c1, c2, …)
L3
.
.
.
.
.
.
.
72
Proc. B(x1, x2,....)
Var f1, f2, …
begin
.
.
.
.
.
.
.
.
.
.
.
.
.
Ellmination of Recursion
Sometimes it is absolutely necessary to eliminate recursive
• recursive calls are not supported e.g., FORTRAN
• speed is the first priority - do it by yourself
Solution: STACK of activation records
Generally, an activation record holds
1. current values of the parameters (pass by value)
2. current values of the local variables
3. a label indicating return address
Assume that if procedure p(x1, x2 …. var y1, y2, ….)
then the recursive call is p(a1, a2, …, y1, y2, ….)
73
General Rules:
Procedure P (x1, x2: int; var y: int);
Var i, j: int;
Begin
____________________________________;
____________________________________;
.
.
.
P(a1, a2,y)
_________________________________________;
_________________________________________;
. .
.
74
end;
Example 1
procedure Ackerrnann (m,n:integer, var A:int);
1.
2.
3.
begin
if n<O or m<O then wnteln(“error”)
else
if m=O then A:=n+1
else
if n=O then Ackermann (m-1 ,1 ,A)
else begin
Ackermann(m,n-1 ,A);
Ackermanfl(m-i ,A,A);
end
end;
75
Recursion Elimination
procedure Ackerrnann(m,n:int; var A:int);
label 1,2,3;
var
S : STACK of record
m, n, l:int
end;
1:2.3;
begin
MAKENULL(S);
1: if n<O or m<O then write!n(uerrorx)
else
if m=O then A:=n+1
else
if n=O then
begin
PUSH((m,n, 3), S);
m:=m- 1;
n:=1;
goto 1
end
else begin
PUSH((m,n,2), S);
n:=n-1; goto 1;
2.
PUSH((m, a, 3), S);
m:=m-1; n:=A; goto l;
end;
76
3. if not EMPTY(S) then
begin
(m, n,1):=TOP(S); POP(S);
case 1 of
2: goto2;
3: goto 3;
end
end
end; {Ackermann}
More details in [AHU] pp. 64- 69.
[HS] pp. 150-153.
* The method works only when
 no pass-by-reference parameters
 or, same p-b-r parameters are passed each time (e.g.,
function)
General case???
POINTER!!!
p(...,var x:type1)
 p(...,xp:↑type1)
77
no global variables
procedure
R(x:integer var y,z: integer);
var
i: integer;
begin
---------------------
--------------------y:=x*i
------------------------------------R(a,i,y);
-------------------
-----------------end;
Trees
78
Basic Terminology
1
2
4
3
12
5
7
6
1.
2.
9
8
11
10
a single node is a tree, also the root.
if T1,T2 are trees with roots
n1, n2, …., nk. Then
n
nk
T=
n1
n2
T1
T2
Is a tree with root n.
n1, n2, …, nk are the children of n.
actually. A rooted tree or oriented
siblings
79
TR
a subtree of n(and of T)
n is the parent of n1, n2, …, nk
Note the every node (except root) has a unique parent.
A node with no children is a leaf.
A non-leaf node is also called an internal node.
n1
n2
n3
nk
n1, n2, n3,…., nk is a path of length k-1 from n1 to nk
Note: n1 is a path of length 0
n1 is an ancestor of nk
nk is a desendent of n1
height of n: length of the longest path from n to a leaf
80
height of a leaf is 0!
depth of n: length of the unique path from root to n.
depth of root is o!
height (or depth) of a tree: height of root.
Order of nodes
in a tree,
siblings are ordered from left-to-right
(ordered)
a
≠
a
b
c
c
b
if n is to the left of n2 then all descendents of n1 are to the left of all
descendents of n2
81
Tree Traversals
n
T
T1
T2
Preorder traversal of T is
n,
DFS
preorder traversal of T1
preorder traversal of T2
preorder traversal of Tk
Inorder traversal of T is
i.t. of T1 n, i.t. of T2 ..., i.t. of Tk
Postorder traversal of T is
p.t. of T1 p.t. of T2, ..., p.t. of Tk, n.
↑ evaluation of expression trees, divide-and conquer
82
TR
Example
1
1
4
2
3
10
9
8
7
5
6
Preorder:
1, 2, 5, 3, 6, 7, 4, 8, 9, 10
Inorder:
5, 2, 1, 6, 3, 7, 8, 4, 9, 10
Postorder:
5, 2, 6, 7, 3, 8, 9, 10, 4, 1
Preorder:
we list a node the first time we pass it
Postorder:
we list a node the last time we pass it
Inorder:
we list
the first time, but list an interior node the
second time we pass it
83
procedure Preorder (T:tree);
var
v: node;
begin
V := ROOT(T);
Print v;
for each subtree T of v, from left to right
do
Preorder (T)
end;
time complexity: O(|T|) ← number of nodes
Pre/In/Post
space complexity: 0 (height of T) ← stack
84
Procedure Preorder (T:tree); // no stack //
var
v: node;
begin
V := ROOT(T)
while v ≠ null do
begin
print v;
if v ≠ leaf then v := 1st child of v
back up until
v is not the
last child of
parent(v)
else
while
v ≠ null and
v = last child of Parent(v)
do
v := Parent(v);
if v ≠ null then
v := next sibling of v
end
end;
time = O(|T|) if parent () is 0(1)
space ÷?
85
Reconstructing a tree from its traversals
 Preorder and Postorder traversals are sufficient.
 Preorder and Inorder traversals aren’t sufficient.
a
a
b
c
b
e
d
c
d
Inorder and Postorder traversals aren’t sufficient.
example trees?
Any single traversal isn’t sufficient.
(pre/in/post)
86
e
Labelled Trees, Expression Trees
n1 *
+
+
n3
n2
a
a
b
n4
n5
c
n6
n7
n2 represents a+b
n3 represents a+c
n1 represents (a + b) * (a + c)
Evaluation can be done by a postorder traversal.
pre/in/post-order listings give
prefix (Polish),
infix,
postfix (Reverse Polish)
↑
*+ab+ac
↑
a+b*a+c
↑
ab+ac+*
87
ADT TREE
1. PARENT (n,T).: node. If no parent return null node.
2. LEFTMOST-CHILD (n,T) : node
3. RIGHT-SIBLING (n,T): node
returns the sibling immediately following n.
4. LABEL (n,T): label
≡ DATA(n,T)
5. ROOT(T) : node
6. MAKENULL(T)
7. CREATEL (v1, T1, T2 Ti ): tree; i=O,1,2,...
v
n
T1
Ti
T2
Alternative: ATTACH (T1 T2 ) : tree
8. DELETE (n,T) - delete the subtree rooted at n.
88
a1
n1
a2
a5
n2
a3
a4
n3
n4
a7
n5
a9
a6
a8
n6
n6
n8
LEFTMOST-CHiLD (n1, T = n2 )
RIGHT-SIBLING (n1, T) = n4
RIGHT-SIBLING (n7, T) = ^
procedure PREORDER (n:node);
//list labels of descendents of n in T (global)
in preorder!!
var
begin
print LABEL (n,T);
n := LEFTMOST-CHILD (n,T)
while n ≠ ^ do begin
PREORDER (n);
n := RIGHT-SIBLING (n,T)
end
end;
89
n9
Array Implementation
a
a
1
3
b
2
b
4
c
b
a
9
5
6
10
8
a
c
7
b
1 2 3 4 5 6 7 8 9 10
parent (10, T)
0 1 1 2 2 5 5 5 3 3 0=^
a b a c b a b c a b
label
node i is to the left of node j then i < j
i.e., number siblings from left to right
e.g., preorder, even inorder
type node = 1 .. max
cell = record
parent : 0 … max;
label : labletype
end;
90
THREE = array [1… max] of cell
function LEFTMOST-CHILD(n:node; T:TREE):node;
var
i:integer
begin
i : =1;
while i ≤ max and T [i] . parent ≠ n do
i := i+1;
if
i > max then LEFTMOST-CHILD := 0
else
LEFTMOST-CH := i
time:
O(|T|)
end;
function RIGHT-SIBLING(n:node; T:TREE):node;
var i:integer parent:node;
begin
time
O(|T|)
parent := T[n].parent;
i := n+1;
while i ≤ max and T [i]. parent ≠ parent
do i := i+1;
if i > max then RIGHT-SIBLING := 0
else
RIGHT-SIBLING := 1
end;
91
Trees as lists of children
Label children node right sibling
1
2
.
.
.
.
.
.
3
4
5
6
7
8
9
10
6
node space
type
node = 1 .. max
LIST = …
TREE = record
header : array [1..max] of LIST;
labels : array [1..max] of labletype
root : node
end;
no matter how LIST is implemented,
LEFTMOST-CHILD; RIGHT- SIBLING _ 0(1)
PARENT – 0(|T|)
If want 0(1) for all, add parent field
92
7
Considering CREATE (n, T1, T2, …, Ti);
node space
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
T1
T
T2
A
6
C
4
.
12
.
.
B
10
G
I
D
F
E
.
H
.
.
2
.
.
11
8
14
I
A
E
T1
T2
B
C
F
D
93
G
H
Simplified
Leftmost=child & right-sibling representation
A
C
B
D
Leftmost
Child
label
right
siblings
3
8
B
5
5
0
C
0
7
8
3
0
A
D
0
0
Var cellspace
:
array [1..max] of record
Label : labeltype;
Leftmost-child, right-sibling:0 .. max
End
94
SUMMARY
1.
Array of Parents
• PARENT--O(1)
• LEFTMOST-CHILD, RIGHT-SIBLING - O(|T|)
ALL-CHILDREN — 0(m)
• simple, space-efficient
2.
List of Children
• LEFTMOST-CHILD - 0(1)
• PARENT, RIGHT-SIBLING -- 0(|T|)
• can store several trees, CREATE
3.
Leftmost-child, Right-sibling
• LEFTMOST-CHILD, RIGHT-SIBLING -- 0(1)
• PARENT — O(|T|)
• make tree, CREATE, slightly more space than (2)
95
BINARY TREES
 A node is a binary tree
 If T is a binary tree, v is a node, then
V
V
T
T
If T1, T2 binary trees, v a node then
V
T2
T1
A binary tree is NOT a tree!!!
A
A
B
≠
96
B
Binary Trees
 A child is either a left or right child
 Binary tree are not really trees
full binary tree:
every internal node has two children and leaves
have the same depth
complete binary tree: obtained from a full binary tree as follows:
fix a leaf and delete all the leaves to the right of it
• no. of nodes of depth i ≤ 2i
i
size of a binary tree of depth i ≤ Σ 2i = 2i+1 -1
j=0
If complete, 2i -1 < size ≤ 2i+1 -1
If full, size = 2i+1 -1
size-1
≥ Depth
≤ log2 (size +1) -1
97
Binary tree traversals
v
T1
T2
Preorder (T):
V, preorder (T1), preorder (T2)
Inorder (T):
*
Inorder (T1), v, inorder (T2)
v
T2
Postorder (T):
Postorder (T1), postorder (T2), v
98
How to reconstruct a binary tree from its traversals?
 Just Preorder (or inorder or postorder) traversal is not enough.
 Preorder & Postorder aren’t enough!
a
a
b
b
Preorder
and
a1, a2, ….an
1.
Inorder
b1, b2, …, bn
Find i s.t. a1 = bi
Then T1 = Reconstruct (a2, … ai,bi, …., bi -1)
T = Reconstruct (ai+1, …., an, bi+1, …, bn)
a1
T=
T1
T2
Posorder & inorder similar
99
Representation of binary trees
A
B
.
D
.
C
.
.
.
E
.
Type node = record
label : labeltype;
left, right : ↑ node
end
TREE ↑ node;
Notes:
1. cursors may also be used.
2. if operation PARENT ( ) is crucial, a parent
field could be included.
3. but if traversal is the only concern, then
the parent field is not really needed.
100
F
.
procedure PREORDER(T:TREE);
var temp, tempparent, tempchild;
procedure BACKUP;
var stop : boolean;
begin
// find the successor of temp in preorder traversal//
stop false;
temp:=tempparent;
while temp ≠ nil and not stop do
begin
if temp ↑. tag = 0 then begin
tempparent := temp ↑. left;
temp ↑ .left := tempchild;
if temp 1. right ≠ nil then
begin
tempchild := temp t right;
temp ↑. right := tempparent;
temp ↑. tag := 1
temparent : = then
temp := tempchild;
stop:= true; return;
end
else begin // tempt. tag = 1 //
tempparent := temp ↑ .right;
temp ↑. right := tempchitd end;
tempchild := temp; temp := tempparent
end
end; // end of backup //
101
Begin
// print nodes of T in preorder //
temp := T;
tempparent := nil;
while temp ≠ nil do
begin
Print temp ↑. label;
if temp ↑. left ≠ nil then begin
tempchild := temp ↑ . left;
temp ↑. left := tempparent;
temp ↑ . tag := 0;
tempparent := temp;
temp := tempchild
end
else if temp ↑ . right ≠ nil then begin
tempchild := temp ↑ . right;
temp ↑. right := tempparent;
temp ↑. tag := 1;
tempparent := temp’
temp := tempchild
end
else
//temp ↑ .left = temp ↑. right = nil //
BACKUP
end
end; {end of PREORDER}
102
Threaded binary trees
0
0
0
0
.
1
1
lefttag
righttag =
=
0
1
1
1
0
1
→ left = leftchild
→left = leftthread (predecessor) in inorder
0
1
→ right = right child
right thread (successor)
predecessor/successor in inorder can be found without using stack or
flipping
103
.
.
Representation of complete binary trees
A
1
2
B
C
3
4
8
H
5
D
9
E
1 2 3 4 5 6
T
F
6
10
G
7
7 8 9 10 11 12 13
A B C D E F G HI
J
← largest integer ≤ i/2
parent of node i
= [1/2],
1<i≤n
left child of node I
= 2i,
2i ≤ n
right child of node I
= 2i +1,
2i +1 ≤ n
type THREE = record
n : 0 ..
max;
labels : array [1..max] of labeltype
end;
104
A
B
D
C
.
.
.
.
E
.
.
H .
Var
temp, tempparent, tempchild : ↑ node;
tag =
0→
left points to parent
1→
right points to parent
type
node = record
label : labeltype;
left, right: ↑ node;
tag:0..1;
end;
TREE = ↑ node;
105
F .
G
.
An a of binary trees - Huffman codes
characters :
{a1,a2, …. Ak} = A
string or message :
x1, x2,….xn є A
p(ai) - the probability that a will appear in a message
Encoding: assign a binary code c(ai) for each ai
c(x1, x2…xn) = c(x1)… c(xn)
Decoding: given code bib find the unique message
x1,x2….xn such that C(x1,x2… xn) = b1b2 …bm
Average code length:
k
Σ
p(a1).| c(a1) |
i =1
| c(a1)| : length of c(a1)
106
character
a
b
c
d
e
probability code 1
.30
.10
.10
.10
.40
average
000
001
010
011
100
3
code 2
code
01
0010
0011
000
1
00
01
10
000
1
2.1
3
1.7
Prefix property: c(ai ) is not prefix of c(ai ) for any j ≠ i
e.g., Code 1 and Code 2 have prefix property,
Code 3 doesn’t!
Claim :
prefix property makes decoding easy e.g., comsider
Decoding code 000 ….
Code 1
a … on-line decoding
Code 2
d…
Code 3
??
107
Huffman Code - an optimal (least average code length) prefix code
Algorithm Huffman ({a1, a2, …. an});
//find Huffman code c(a for each ai //
Let a and a be two characters such that p(ai) and p(aj) are the
lowest among a1, a2, …., an;
Let a be a new character and p(a’) = p(ai) + p(aj);
Huffman ({a1, a2, …, an} - (ai,aj} + {a’});
c(ai) = c(a’) 0;
c(aj) = c(a) 1;
end;
Example: {a,b,c}
Hufiman ({a,b,c})
p(a) = 0.5,
p(b) = 0.3, p(c) = 0.2
c(a)=0,
Huff man ({a,[bc]})=> c(a) = 0,
c([b]) = 1
108
c(b)=10,
c(c)=11
Binary tree representation of prefix code
0
1
1
0
0
0
0
0
e
a
d
0
1
0
1
0
a
b
c
d
e
code 1
0
b
1
c
code 2
type
node = record
left, right :↑ node;
probability : real ;
character : {a1, a2, …., ak)
end;
used only in leaves
A more efficient implementation is given in [AHU] pp.94 -101
109
example
1)
a
.10
b
.20
c
.05
d
.05
e
.10
.10
.20
.05
.05
a
b
c
d
.10
.20
.10
.10
f
.30
g
.10
h
.10
.10
.30
.10
.10
e
f
g
.30
.10
.10
h
2)
a
b
e
.05
f
g
h
.05
c
d
(3) & (4)
.20
.20
called a forest
a
.10
.10
.05
.05
c
d
110
.10
.10
e
g
.20
.30
b
f
.10
h
(5)
.20
.20
called a forest
a
.10
.10
.10
.10
e
g
.05
c .30
d
.30
.30
f
.20
.10
b
h
(6)
.20
.40
.10
.10
.10
.20
g
e
.10
a
.05
c
.05
d
111
(7)
.20
.40
.10
.10
.10
.20
g
e
.05
.10
a
c
.05
d
.60
.30
.30
f
.20
b
.10
h
112
(6)
1
0
1
0
1
0
f
0
0
1
0
0
1
1
1
e
g
b
h
a
c
d
using a modified preorder listing, we can print the Huffman codes for
the characters (using a stack)
Algotithm Huffman-Tree;
// construct a huffman tree for characters a1,a2,….,an//
var forest: array [1… max] of THREE;
p:real;
begin
113
for i:=1 to n do
begin
new(forest[i]);
forest[i] ↑. left := nil;
forest[i] ↑. right:= nil;
forest[i] ↑. probability := p(ai);
forest[i] ↑. Character := ai
end;
while forest contains more than one tree do begin
i := index of the tree with the smallest prob.;
j := index of the tree with the second smallest prob.;
p := forest[i] ↑. prob + forest [i] ↑. Prob.;
forest [i] := CR EATE2( (p,-) ,forest[i],forest[j]);
Delete tree forest[j]
End
End
114
A set is a collection of elements/members
Notes:
1. An element can be a set!
2. A set can be infinite or empty.
3. Usually (in this course), members of a set are of the same type.
4. Members of a set are different (otherwise, a multiset).
5. Members could be nearly ordered.
A relation is a linear order on some set S
(i)
(ii)
for any a + b in S. exactly one of a<b, a+b, a>b is true.
(Trichotomy)
for a,b,c in S,
a<b, b<c ==> a<c (Transitivity)
115
Some notation:
S = { a1, a2, …an}
or S = (x|x satisfies condition?)
e.g. {1,2,...,10} = (x|x is an integer and 1 ≤ x ≤ 10)
Ø = {}
Membership:
x є S,
x ∉ S
inclusion:
S1⊆ S2
S1⊈ S2
(subset)
S1⊆ S2
iff S1 ≠ S2 and S1 ⊆ S2
superset
S1⊇ S2
proper superset:
S1⊇ S2
Union:
S1∪ S2
{1, 2}∪ (2, 3)={1,2,3}
Intersection:
S1∩S2
{1, 2} ∩{1, 3} = {2}
Difference:
S1-S2
{1, 2} – {2, 3} = {1}
116
ADT SET

1.
MAKENULL(S):

2.
INSERT(x,S): S:=S∪ (x}

3.
DELETE(x,S): S:=S-{x}

4.
MEMBER(x,S):true iff x ∊ S

5.
ASSIGN(A,B): copy B into A

6.
EQUAL(A,B): true iff A = B

7.
UNION(A,B,C):

8.
INTERSECTION(A,B,C): C:=A∪B

9.
DWFERENCE(A,B,C):

10.
MERGE(A,B,C):

11.
MIN(S):
returns the minimum element In S
assuming S is linearly ordered

12.
FIND(X):
disjoint A1,A2 ,…An - global
find the unique A1 St. X ∊ A1
 13.
S:=ø
C:= A∪B
C:=A-B
if 4∩B=Ø, C:=A∪B
otherwise C undefined
SIZE(S). *SUBSET(A,B) COMPLEMENT(A)
117
SET with Union, Intersection, Difference
Example – data-flow analysis
B1
1.
2.
3.
GEN = {1,2,3}
KILL = {4, 5, 6, 7, 8, 9}
t: = ?
p:= ?
q:= ?
4.
5.
read (p)
read (q)
B2
GEN = {4,5}
KILL = {2, 3, 7, 8}
q ≤ p?
GEN = KILL
y
B3
GEN = {6}
KILL = {1, 9}
6. t : = p
B4
7.
8.
P:=q
q:=t
GEN = {7, 8}
KILL = {2, 3, 4, 5}
B6
GEN = KILL = ∅
P mod q =0
B6
y
Write (q)
9. t : = pmodq
B8
GEN = KILL = ∅
B7
GEN = {9}
KILL = {1,6}
GEN[i] =
{data definition in B1}
KILL[i] = {d|d ∊ Bi & ∊ d ė Bi
defining same var as D}
118
DEF1NE[i]
{d|∃ a path B1….BiBi, such that d is the last
definition of the variable defined d in the path }
reaching definitions
of Bi
DEFIN = (1,4,5)
DEFIN = (4,5,6,7,8,9)
GEN[i]= {data definitions in Bi }
KILL[i] = (data definitions not in B), but defining the same variables
as GEN[i]
DEFOUT[i] = {d|(same as in DEFIN[I] except “Bi…BiBi”)}
leaving definitions
DEFOUT[i] = (DEFIN[i] – KILL[i]) ∪GEN[i]
DEFIN[i] = ∪ DEFOUT[i]
Bi is a
predeceasor
of Bi,
i.e. there is an arc from Bi to Bi)
119
Algorithm dataflow ( GEN;KILL; var DEFIN);
Var
begin
temp SET;
i = integer;
changed : boolean;
no. of blocks
for i:= 1 to n do begin
MAKENULL(DEFIN[i]);
MAKENULL(DEFOUT[i]) end;
repeat
changed := false;
for i:= 1 to n do begin
DIFFERENCE(DEFIN[i], KILL[i], temp);
UNION (temp. GEN[i], temp);
If not EQUAL (temp, DEFOUT[i]) then
ASSIGN (DEFOUT[i], temp);
Change : = true;
End;
For i:= 1 to n do begin
MAKENULL(DEFIN[i]);
for each predecessor Bi of Bi do
UNION(DEFIN[i], DEFOUT[i],DEFIN[i])
end;
until not changed;
end;
120
Example
B1
B2
1.
2.
read (x)
read (y)
3.
4.
x: = x+y
z: = 10.0
GEN[1] ={1,2}
KILL[1]= {3,5}
GEN[2] = {3 4}
KILL[2] = {1}
x z?
B3
5.
GEN[3] = KILL [3] =Ø
y :=x*z
GEN[4] ={5}
KILL[4] = {2}
B4
DEFOUT[I] =
(DEFIN[I] – KILL  GEN[I]
DEFIN[I]=  DEFOUT[j]
Bj is a predecessor of BI
iteration
DEFIN[1]
DEFOUT[1]
DEFIN[2]
DEFOUT[2]
DEFIN[3]
DEFOUT[3]
DEFIN[4]
DEFOUT[4]
Ø
1
Ø
Ø
Ø
Ø
Ø
Ø
Ø
Ø
Ø
1,2
1,2
3,4
3,4
Ø
Ø
5
2
3
4
3,4
2,3,4
2,3,4
1,2
1,2,4
1,2,4
1,2
1,2,4
1,2,4
2,3,4
2,3,4
2,3,4
2,3,4
2,3,4
2,3,4
3,4
2,3,4
2,3,4
3,4
2,3,4
2,3,4
5
3,4,5
3,4,5
121
BIT- VECTOR IMPLEMENATION
{A,B,….Z}
S  {1,2,…N}
1
UNIVERSAL SET
2
I
true iff i Є S
const N =?
Type SET = packed array [1..N] of boolean;
Procedure
UNIN (A,B: SET; var C:SET);
var I = interger;
begin
for I:=1 to N do
C[I]:=A[I] or B[I];
End;
MEMBER. INSERT, DELETE, - O (1)
MAKENULL, ASSIGN, EQUAL
UNION, DIFFERENCE, INTERSECTION – O(N)
EMPTY
122
N
Linked –list implementation
 most general, size id unlimited
 efficient if the sets are ordering by “<”
 in that case, a set is represented as a shorted list a1,a2…, an
a1 <a2 < …..an
Unsorted
MAKENULL, EMPTY – 0(N)
INSERT, MEMBER, DELETE, ASSIGN, 0(n)
EQUAL, UNION INTERSECTION, DIFFRENCE – 0(nm)
SORTED
MAKENULL, EMPLTY –(1)
INSERT, MEMBER, DELETE, ASSIGN, EQUAL, UNION,
INTERSECTION, DIFFERENCE – O(n) OR O(n+m)
Can be improved to O(logn)
If balanced search trees are used
123
where
ADT Dictionary
SET with IINSERT, DELETE, MEMBER, and MAKENULL
Example
Dean’s list data base
Program deanlist(input, output);
Type name = packed array[1…20] of char,
Grade = -1 ..12
Var
student :name;
Average: grade;
Database: DICTIOARY (of names)
Begin
MAKENULL(database);
Readln(student, average);
While student# ‘’ do begin
Case avergage of
12..10 : INSERT (student, database)
9:
8.. 0 : DELETE(student, database)
-1 : if MEMBER(student, database)
then writein(‘yes’)
else writeln(‘no’)
endcase
end
end
124
A modified dictionary
Type
Elementtype = record
Key : keytype;
Data : datatype
End;
Then
MAKENULL, INSERT,
DELETE
QUERY (x:keytype) : datatype;
INSERT((key,data), dictionary)
DELETE(KEY, dictionary)
QUERY(key, dictionary)
125
Implementation of dictionary
1.
Bit-vector if the universal ser is {1,2,…}
INSERT, DELETE, MEMBER – O(1)
2.
Sorted or
unsorted
o(n)
INSERT O(n)
DELETE or
MEMBER O(logn)
INSERT
O(n)
DELETE
MEMBER – o(n)
If set is
ordered
1.
Unsorted array (of some constant size)
Type
DICTIONARY = record
Last : 0..max+1
Data : array [1..max] of element type end;
Procedure MAKENULL (var : A DICTIONARY)
Begin
A last :=0
End;
0(1)
126
Function MEMBER(x:elementype; varA:DICTIONARY):boolean;
Var i : integer;
Begin
For i:=1 A.last do
O(n)
if A.data[i] = x
Then return (true);
Return(false);
End
Procedure INSERT (x:elementtype; varA:dict…);
Begin
If not MEMBER(x,A) then
If A.last <max then begin
O(n)
A.last := A.last +1
A.data[A.last]:x
End
Else error(‘full’)
End;
Proceudre DELETE(x,A);;
Var i:= integer
Begin
Find the i s.t A.data[i] =x
O(n)
or I>a.last;
If A.data[i] = then A.data[I]:=…
127
Hashing – O(1) time/operation on avg
INSERT. DELETE, MEMBER
Universal set
To represent
set S: put ai in cell I if ai Є
a1
a2
a3
a4
.
.
.
.
an-1
an
0(1) time if
rank (a) = i is
o(1) time
Generally, partition the elements into groups and let all elements in a group
share a cell
O(1) time if h(x) = I if x in a group I can be computed in O(1) time.
Perfect if elements from a same group do not occur simultaneously!
Good if it is unlikely two elements from a same group occur simultaneously!
Okay if not TOO MANY elements from a same group occur simultaneously!
Some hash functions:
I mod p, sum of digits, h(135) = 9
128
Hashing
Goal:
O(1) / Operation of average
INSERT, DELETE, MEMBER
Pr(time > C) << 1.0
Open hashing
buckets
Partition elements in to B classes
Hashing function h(x) = I if x Є class I
0
1
b-1
Bucket tables
Headers
list of elements
in each bucket
Avg time =1 + N/B per operation
If N ≤ C*B., avg time ≤ 1+ C
129
Closed Hashing
0
1
B-1
Bucket table
Insert:
X x is placed in bucket h1(x)
If bucket h1(x) is already taken collision
Then try bucket h2(x) rehashing
If bucket h2(x) is taken
Then try bucket h3(x)
.
.
.
Member:
X
try bucket h1(x), h2(x)
Until find it or an empty bucket is met
130
Example 2
Sorting using priority queue
Key ≡ priority
Pool: priority queue
Procedure PQSort(var a array [1…n] of ….);
Var pool:PRIORITY QUEUE of ….;
I:Intger;
Begin MAKENULL(pool);
For i:=1 to n do
INSERT(A[i], pool);
For i:=1 to n do
A[I]:= DELELTEMIN(pool)
End;
Obs: if INSERT, DELETEMIN- o(login), then PQSort – O(nlogn)
131
Previous implantation of sets
Bit -vector – O (N) DELETEMIN
Array -
O (n) INSERT & DELETEMIN
Linked list – unsorted O(n) DELETEMIN
- sorted O(n) INSERT
Hashing - DELETEMIN O(n)
Solution – heap partially ordered tree in [AHU]
1
3
2
parent ≤ child
3
5
9
4
5
6
8
7
6
8
9
10
10
3
4
3
9
9
9
18
1 2
10
4
6
5 6
8
9
7 8 9
10 10 18
132
10 11
9
DELETEMIN:
9
5
9
6
8
9
10
10
18
5
9
9
6
8
9
10
10
18
5
6
6
9
9
8
9
10
18
Generally, time = O(depth of tree) = O(logn)
(2depth ≤ n ≤ 2depth +1)
133
10
Insert (4, heap);
3
5
9
8
6
9
9
10
10
4
18
3
5
9
4
6
9
9
10
10
8
18
3
4
9
5
6
9
9
10
8
18
time = O(depth) = O(logn)
134
10
A linear order’<’ is a relation on elements;
(i) for any two elements a and b
a < b, a > b, or a = b
(ii) a<b and b<c
 a <c
A set is ordered if a linear order’<’ on its members exists, e.g. sets of
integers
reals
character strings (by lexicographical order)
Note: the appearance order of elements in a set representation is
unimportant, e.g. (1,3,4) = (3,1,4} = {4,3,1}
A sorted list:
a1,a2 a3 a4,…. an-1, an
135
Representing ordered sets – binary search trees
Elements are ordered by ‘<’
Interested In operations:
MAKENULL, INSERT, MEMBER, DELETE, MIN
Previous implementation:
sorted linked list:
MEMBER - 0(n)
sorted array:
INSERT, DELETE - 0(n)
Solution: binary search tree
20
30
15
25
left subtree < parent
45
parent < right subtree
10
17
28
16
136
Pascal Implementation
type
elementtype = record
key:real;
data:datatype
end;
nodetypes = (leaf, interior)
I
twothreernode = record
case kind : nodetypes of
leaf: (element:elementtype);
nterior (first,second,third:
↑ twothreenode;
lowofseconcd,lowofthird:real
end;
SET = ↑ twothreenode
need parent: ↑ Twothreenode?
2-3 three: 3-way B-tree
137
AVL – tree (Adelson – Velskii, landis)
Balance binary search tree
[HS pp.436-452]
AVL tree
AVL tree
dL
dR
I dL – dR 1≤1
Empty, single nodes are also AVL trees.
An AVL tree is also called a height-balance (or depth) binary tree.
BF = dL- dR
12
BF =1
-1
15
-1
7
0
19
1
2
10
0
0
8
138
nd: minimum number of nodes in an AVL tree with depth d.
Fact
n0 = 1
n1 =2
nd = nd-1 + nd-2 +1
Similar to Fd = Fd-1 + Fd-2 Fibonacci number
nd ≥ Fd
= Cd/5
c = 1+2 5 >1
d ≤ log cnd + logc 5
depth O(logn)
MEMBER, INSERT, DELETE — O(Iogn)
139
Sets with MERGE and FIND
MERGE(A,B,C): it A∩B =∅ then C:=A∪B
environment = A1,A2,…,Am
FIND(x): the unique Ai s.t. x ∊ Ai
Example
Equivalence problem:
Equivalence relation ‘≡’ on set S
1. a ≡a (reflexivity)
2. a ≡b b ≡ a (symmetry)
3. a≡b, b≡c a ≡c (transitivity)
e.g. congruence modulo K i ≡kj iff(i-j) mod K = 0
equivalence classes:
S = S1, ∪ S2∪ S3 ∪...
s.t. a,b ∊ si a≡ b
a ∊ si a ∊ sj a ≢ b, i≠j
e.g. (0,k,2k,…} {1,k+1, 2k+1,…} …{k-1, 2k-1,…}
140
s = {a1,a2,a3,a4,a5,a6,a5,a6,a7}
Fortran: EQUIVALENCE
.
.
.
a11≡a12
a13≡a14
.
.
.
{ a1 } { a2 } { a3 } { a4 } { a 5 } { a6 } { a7 }
a1≡a1
{ a1, a2}
{ a3 } { a4 } { a5 } { a6 } { a7 }
a5≡a6
{ a1, a2}
{ a3} { a4} { a5, a6}
{ a7 }
a3≡a5
{ a1, a2}
{ a3, a5, a6} { a4}
{ a7 }
a4≡a7
{ a1, a2}
{ a3,a5, a6} { a4 a7}
ai≡aj
A = FIND(ai; B = FIND(aj );
MERGE (A,B,A);
MAKENULL(B);
∪ ={ a1, a2, …., an }
=A1∪A2 ∪…∪=Am
A Partition
ADT MFSET { A1, A2,… Am } component
1. MERGE(A,B): A:=A∪B or B:=A∪B
2. FIND(X)
3. INITIAL(A,x): A:={x}
141
A simple implementation
element-based
Type
MFSET = array[membertype]of set-id-type
∪ = {1, 2, …, 12}
1 2 3 4 5 6 7 8
9
10 11 12
2 1 1 2 3
2
4
1 4 3
4 3
= {(2, 3, 6}, {1, 4, 9}}
{5, 8,12}, {7, 10, 11}}
type
set-id-type = integer
membertype = 1…n
function
FIND(x:1..n; var C:MFSET);
Begin
O(1)
FIND := c(x)
End
Procedure MERGE (A, B:integer; var C:MFSET); // A: A∪B//
Var
X:1..n;
Begin
For x:=1 to n do
If C[x] = B then
O(n)
C[x] :=A
End;
142
By some minor improvement
N merges can be done in O(nlogn) time using member list).
A tree implementation
component-based
A
B
1
1
C
5
7
1
6
1
A = {1, 2, 3, 4}
B={5, 6}
MERGE (A,B)
A
C= {7}
1
2
5
3
6
4
Time=0(1)
143
FIND (x) – O(depth)
i
i
*weight rule:
if we always merge the smaller tree into the large tree, then depth ≤ log2n
 Root must conatin weight of the tree.
Path compression
1
3
2
5
4
6
7
Find 6
8
1
3
3
4
144
5
6
7
8
With path compression only
n consecutive FIND - O(n)time
n Intermixed FIND and MERGE - O(nlogn)
With both path compression and rule ( *)
n intermixed FIND and MERGE - O(na(n))
α(n) = the least m s.t. n ≤ A(m,m)
pseudo-Inverse of Ackermann’s ftn.
In practice, α (n) ≤ 4 I
since
2
A(4,4) = 2
.
.
65536
.
2
2
145
Ordered Sets with MEREGE, FIND, SPLIT
SLPIT (S, S1, S2, x):
S1 := {a| a ∊ S and a < x}
S2 : = {a| a ∊ S and a ≥ x}
Longest Common Subsequence Problem
Sequence = string e.g. abcdaaa
A subsequence of a sequence x is obtained by removing zero or more (not
necessarily contiguous) character from x
e.g. ab, aaaa, ada are subsequences of the above sequence
Longest common subsequence of x and y:
A longest sequence that is a subsequence of both x and y
146
e.g.
x=1214321
y=25134121
21421 is an LCS
21321 is another one
Application :
UNIX diff
DNA analysis, ete.
Solutions:
x1, x2, …xn
|x| = n
1.
dynamic programming
O(nm)
y1,y2, …, ym
|y| = m
2. O(plogn)
Where p is the size of
{(i,j) | ≤ n, 1 ≤i ≤n, 1≤i≤m, and x1 = y1}
worst case p = O(mn)
In practice p = O(m+n)
147
Key idea
Input : A = a1a2…an
B = b1b2…bm
To find | LCS(A,B) |
For j:=1 to m do
Find | LCS( a1….ai,bi….bj) |
Def.
Sk = {i | |LCS(a1…aib1…bj)| = k
A=
12345678
1214321
B=
25134121
s1
s2
J
s0
s3
s4
s5
s6
s7
1
{1} {2,3,4,5,6,7} ∅
∅
∅
∅
∅
2
{1} {2,3,4,5,6,7} ∅
∅
∅
∅
∅
3
∅ {1,2} {3,4,5,6,7} ∅
∅
4
∅ {1,2} {3,4} {5,6,7} ∅
5
∅ {1,2} {3} {4, 5,6,7} ∅
6
∅
7
∅
148
8
∅
Def. PLACES (a) = {I| 1 ≤ i≤n, ai = a }
All PLACES(a) can be obtained in O(n) time, assuming the alphabet is
finite.
If not, O(nlogn)
e.g.
if PLACES(a) = {i1,i2…., ik}
i1 > i2 > ….ik
PLACES [a]
i1
...
i2
iR
Hashing
Intuitive fact:
in iteration j (i.e. when considering bj), new matches
happen at PLACES (bj) in A. These matches may have a position from sk to
sk +1.
Rule:
r ∊ sr to sk+1 iff j -1
Move r to sk+1 iff
1. ar = bj (i.e., γ ∊ PLACES(bj))
2. r-1 ∊ sk
a1…ar-1 ar …
b1 … bj-1 bj …
149
Procedure LCS:
Begin
Initialize s0 = [ 1, 2,…n} and
Si = 0 for i=1,2,…n;
for j:=1 to m do
(1)
(2)
//compute sk’s for postion j //
(3)
(4)
(5)
(6)
(7)
for r in PLACES(bj)
k:=FIND(r);
if K = find(r -1) then begin
SLPIT(sk, sk, s’k , r);
MERGES(sk, sk+1, sk+1)
End
End
End;
Obs: if FIND, MERGE, SPLIT can be done in 0(logn), then the total time is
m
0
∑ |
PLACES(bj).| logn
j-1
0 = (p.logn)
150
Data structure for sets S0, S1, …, Sn
2 -3 trees ! ! !
8
K
9
10
11
12
13
14
FIND ( r) : O(depth) = O(logn)
MERGES (S’k, Sk+1, Sk+1):
New Sk+1
S’k
New Sk+1
Sk+1
Sk+1
S’k
Similar to INSERT, repair
Takes O(logn) time
151
S’k
Sk+1
APLIT ( )
6
7
8
9
10
11
12
r=9
split at 9
8
6
10
7
12
9
9
6
11
7
8
time = O(logn)
152
10
11 12
Graphs: A Math Model
HW401
Hw401
Waterloo
Toronto
London
HW6
QEW
QEW
Hw403
Hamilton
Niagara falls
Toronto
Minneapolis
New York
Chicago
New Orleans
Miami
Flight Map
(Imaginary)
KNOW
Bob
Mary
Mary
y
Bob
friends
Alex
Mark
Alex
Mark
Sandy
Sandy
Misc: state transition diagrams
153
Directed Graphs (Digraphs)
V1 = {1, 2, 3, 4, 5}
E1 = {(1,2), (1,3), (2,3),(3,4),(4,5),(4,1),(5,1), (5,4)}
G1 : = (V1, E1)
1
A digraph G = (V, E)
5
2
V: set of verices/nodes
3
4
E: set of arcs/directed edges
The arc from vertex v to vertex w:
(v,w) v≠w
V➙w or
Tail
head
w is adjacent to v
|V| = n
|E| ≤ n(n-1)
= O(n2)
A path v1,v2, …vm s.t. the arcs (v1,v2,(v2,v3),…,(vm-1,vm)exist.
Length of the path : m-1
The path passes through v2, v3, …,vm-1
The path is simple if all vertices on the path are distinct, except possibly the
first and last.
154
(Simple) cycle:
a (simple) path of length at least one that begins and
ends at the same vertex.
e.g.
1, 2, 3, 4, 1, 3
is a path
1, 2, 3, 4, 5
is a simple path
1, 2, 3, 4, 5, 1
is a simple cycle
1, 2, 3, 4, 1, 3, 4
is a simple cycle
labelled diagraph
a
b
a
b
b
b
abab
abbaaaba
...
a
a
When the labels are numbers, the diagraph is also called a network or
weighted diagraph.
155
Representation of diagraphs
1.
List of edges e.g., (1,2), (1, 3), (2, 3),…
2.
Adjacency matrix
G = (V,E)
V = {1,2, …, n}
Adjacency matrix for G is an n x n
Boolean matrix
A[i,j] = true1 if (i, j) ∊ E
= false0 otherwise
space:
3.
O(n2) even if | E | << o(n2)
Adjacency list
2
3
4
1
1
1
2
3
4
5
5
156
.
3
.
Space : O(|E|)
to decide if i➙j, we need O(n) time
ADT DIAGRAPH
Single source shortest paths problem:
Given G = (V,E) and source vertex
2
15
40 100
6
50
1
20
50
3
10
503
18
40
5
20
4
30
labels (costs)
must be ≥ 0
costs (2,1) = +∞
We need to determine the cost of the shortest path from sources to every
other vertex
n-1
Cost(v1,v2,…vn) = ∑ cost(vi➙vi+1)
i=1
e.g. source =1
to
2
min cost
70
157
3
4
5
6
60
40
10
30
Dijkstra’s algorithm
Source vertex =1 G = (V,E)
Distance
V= {1, 2, 3}
D(i) = cost of shortest path 1 to i
Let S ⊆ V be a set of verticles,
Ds (i) = cost of shortest path from 1 to i that only passes
Through vertices in S
S is called a restriction
set
10
1
4
3
2
B(3) = 9
Ds(3) = 10 if S = {1}
5
Fact:
Dv(i) = D(i) and D∅ (i) = cost(1➙i)
Idea:
Let S ⊆ V be some set s.t. 1∊ S
Suppose we know Ds(i) for each i ≤ V. Then we can enlarge S
as follows:
1.
w ∊ V-S and Ds(w) is the minimum
158
2.
S= S∪{w}
3.
Ds(i): min(Ds(i),Ds(w) + cost (w, i))
i
.
.
.
S
.
.
.
.
Ds(i)
.
|
w
.
.
Ds(w)
Ds(x)≤ Ds(w)
For any X ∊ S
Algorithm
Begin
D[i]≡
Ds(i)
S:= {1};
For i:=1 ton do
D[i]:= cost(1,i);
For i:= to n-1 do begin
Find w in V-S s.t. D[w] is a minimum;
S:=S∪{w};
For j:=2 to n do
D[j]:=min(D[j], D[w] + cost(w, j))
End
159
.
End;
Obs. D[j] = D(i) if I ∊ S
Ds(i) = D(i) for I ∊ S
Thus no need to update D[i] if ∊ S
Example
10
2
20
3
10
60
10
30
10
1
6
40
10
4
5
30
50
2
6
3
30
60 = d[3]
1
0
5
10
1
0
5
10
+∞
40
6
3
2
30
4
3
60
230
60
4
1
0
+∞
5
40
160
10
4
40
6
+∞
6
3
2 30
1
1
0
5
10
6
3
2 30
+∞
4
40
0
5
10
161
+∞
4
40
Procedure Dijkstra:
// C[i,j] = cost(i,j)//
1.
2.
begin
S := {1}
For i :=2 to n do
D[j] :=C[1, i];
For i := 2 to n-1 do begin
Find a w in V-S such that
D[w] is a minimum;
S := S ∪{w};
for each vertex V in V-S do
D[v] := min(D[v] + C[w,v])
End
End;
How to recover the
Shortest paths?
Time = O(n2)
Adjacency lists of costs:
W
V1 C1
162
Priority Queue for V-S
time: 0(|Ellogn)
All-Pairs Shortest Paths Problem
Given: a digraph with nonnegative arc costs
Goal: for each pair v, w of vertices find the cost of the shortest path from v
to w.
Application: construction of shortest flying time table
Solution 1: repeat Dijkstra’s algorithm with source = 1 ,2,...,n
time: 0(n3) or 0(n | E | logn)
Solution 2: Floyd’s algorithm
let D(i,j) and Ds(i,j) be as before
D(i,j): distance from i to j
D distance from i to j under restrictions.
163
Floyd’s Idea:
Let sk = {1, 2, …, k}, 0 ≤ k < n
Suppose
Dsk(i,j) is know for all 1≤ I, j, ≤ n
Then,
Let sk+1 = {1, 2, …k+1}
Dsk+1 (i,j) = min
Dsk(i,j)
Dsk(I,k+1) + Dsk(k+1, j)
For all i≤ j, j≤n
Thus, we compute
Ds0(i,j)
Cost (i,j)
Ds1(I,j),….,
Dsn(i,j)
D(i,j)
164
in the following procedure, A is an nxn matrix
A[i,j] = Dsk,((i,j) after k-th iteration
procedure Floyd(Var A:arraY[l ..n,1 ..n] of real; C:...);
var i,j,k: integer,
begin
for i := i to n do
for j:=1 to n do
A[i,j] = C[i,j]; //A = Ds0//
for i := 1 to n do
A[i,j] :=0;
for k := 1 to n do
for i := 1 to n do
for j:=1 to n do
if A[i,k] + A[k,j] <A[i,j] then
A[i,j] := A[k,j]+ A[k,j]
// A[i,j]:= min(A[i,j], A[i,k] + A[k,j]) //
end;
time = 0(n3)
165
Recovering the paths
Use an rrxn matrix P
initially, P[i,j]:=0
1 ≤ 1,1 ≤n
In procedure Floyd, insert red line
If A[i,k] + A[k,j] <A [i,j]then
begin
A [i,j] := A[i,k] + A[k,j];
P [i,j] : = k
end
Meaning: the
Shortest path
from i to j passes through vertex k
Procedure path (i,j:integer);
// print a shortest path from I to j //
var
k: integer;
begin
k:= p[i,j];
if k ≠ 0 then begin
path(i,k);
writeln(k);
path(k,j);
end;
// the path is not direct //
end;
166
Transitive closure of adjacency matrix
Given: digraph G = (V,E) represented by adjacency matrix C
Goal: for each pair i,j, whether there exists a path from i to j
A[i,j]= 1 true if  apath from
0 false other
for
i toj
1≤i,j≤n
A is called the transitive closure of C
Solution 1: Use Floyd’s algorithm.
Initialize A[i,j] = + ∞ if C[i,j] = 0
A[i,j] =1
other
At the end, set A[i,j] =1 if A[i,j] ≠ + ∞
= 0 if .. = + ∞
Solution 2: Simplified Floyd’s algorithms - Warshall’s algorithm
in iteration k A[i,j] := A[i,j] or A[i,k] and A[k,j]
167
procedure Warshall(var A:array[1…n, 1…n] of boolean; C:...);
var
i,j,k: integer;
begin
for i := 1 to n do
for j := 1 to n do
A[I, j] := C[i,j];
For k := 1 to n do
for j := 1 to n do
for j := i to n do
If A[i,j] = 0 then
A[i,j] = and A[k, i]
end;
lime = O(n3)
A=C + C.C + C.C.C +…. + Cn-1
•‘: boolean multiplication, i.e. and
it is known C.C can be done in 0(n2.376)
to obtain A, compute
c2, c4, c8,… cn-1
log2n
time : O(logn *n2.376) < 0(n3)
168
2
1
Undirected graphs
G = (V,E)
3
5
4
V={(1,2),(2,3)(3,4),(4,5),(5,1),1,3),(2,5)}
(u,v) and (v,u) denotes the same edge
(u,v) is incident upon u and v.
V1,V2,…Vn is a path if the edges (V1,V2),(V2,V3)…,(Vn-1,Vn) exist.
The path V1,V2,…,Vn connects V1 and Vn
Definitions for simple path and cycle
and the same of length ≥ 3
2
G1 = (V1,E1) is a subgraph of
G2 = (V2,E2) if V1 and E1 E2
1
3
If E1 contains all edges (u,v) in E2
Such that u,v V1, G1 is called an
induced subgraph of G2
2
1
169
3
Graph G is connected if every pair of G’s vertices is connected by some
path
Connected component of G: a maximal connected induced subgraph of G
G is cyclic if G contains at least one cycle
G is acyclic if G doesn’t contain any cycles
Free tree:
a connected acyclic graph
Fact:
1. Every n-node free tree has n-1 edges
3. If we add any edge to a free tree, we get a cycle
Claim:
If n>1, there must be a vertex with degree (i.e., number of
edges incident upon the vertex) =1
170
Proof of claim
Let G be a free tree with > 1 node
Suppose that G’s nodes all have degree >1
V1
V2
V3
Vi
Vi+1
Vi+2
∴ a cylcle esists. A contradiction!!
Proof of (1): true If n = 1
Suppose (1) is true for n = k
Let G = (V,E) be a k+1 – node free tree
Let u be a vertex of dgree 1 and (u,w) be the only indicent edge
G’ = (V-{u}. E-{(u,w)}I a free tree
By induction hypothesis, G’ has k-1 edges
∴ G has k edges
proof of (2): if no cycle then the graph is still free tree
but number of edges = n.
contradiction!!!
171
Vj
Representation
Adjacency matrix:
symmetric, i.e. entry i,j = entry j,i
Adjacency list:
redundancy, i.e. if edge (u,v) exists, then u is on
the list for v and v is on the list for u.
Minimum-cost spanning tree
G= (V,E) is connected. Each edge (u,v) E has a cost C(u,v) (=C(v,u)).
A spanning tree of G is a subgraph of G which is a free connecting all
vertices in V.
The cost of a spanning tree is the sum of the costs of edges in the tree.
11
30
3
8
20
15
20
13
30
172
The MST Property:
Let G = (V,E) be a connected graph
U ⊆ V is a proper subset of V
If (u,v) is an edge of the lowest cost s.t.
u ⊆U and v V-U, then there is a minimum cost spanning tree that
includes (u,v) as an edge.
u .
.v
u .
.v’
u
v-u
C(u,v) ≤ C(u’,v’)
procedure Prim (G:graph;var T : set of edges);
II Constructs a minimum-cost spanning tree T II
var
begin
U : set of vertices; U,V:vertex;
T : = ∅ ; ∪ := {1}
while ∪≠ V do being
find a lowest cost edge (u,v) s.t.
u ∊ U and v ∊ U;
T := T ∪ ((u,v)}; ∪ := U ∪ {v}
end
end;
173
1
An example:
9
5
9
2
7
3
5
1
8
8
4
3
8
6
U = {1}
2
5
3
4
U = {1,2}
1
7
5
8
2
4
3
U = {1,2,5}
1
8
2
7
5
3
3
4
U = {1,2,3,5}
8
1
2
7
5
3
3
174
5
4
Kruskal’s algorithms
a connected
component
w.r.t. T
procedure Kruskal (G:graph;var T:set of edges);
var u,v : vertex;
E’ : set of edges;
begin
E’ := E;
T := ∅ ;
while E’ ≠ ∅ do begin
find a lowest cost edge (u,v) in E’;
E’ := E’ – {(u,v)};
If u and v are not in the same connected component then
T:= T∪{(u,v)};
end
end;
K1 := FIND (u)
K2 := FIND (v)
if K1, ≠ K2 then MERGE (k1, k2) …
α(n) time
E’: PRIORITY QUEUE;
T: MFSET;
Time:O(eloge), e = | E|
175
1
Example:
9
5
8
9
2
7
3
5
8
4
1
3
6
add(3,5) to T
2
5
3
3
4
add (4.5)
discard (3,4)
1
5
3
2
5
4
3
Add (2,5)
1
Discard(2,3)
2
7
5
3
3
5
4
Add (1,2)
discard (1,3),(1,5)
8
1
2
7
5
3
5
3
4
176
Graph Traversal and Search
Digraphs - depth-first search
go as far as you can following the arcs!
type
digraph = array [1 ..n] of  adjacency list;
vertex = 1..n
var
V : vertex
mark: array (vertex] of (visited, unvisited);
0(e)
for v.:= 1 to n do mark [V] := unvisited;
for v := 1 to n do
if mark [V] = unvisited then dfs (v);
procedure dfs(v:vertex);
var
w: vertex;
begin
mark[v] := visited;
Print v; // anything //
for each vertex w on L[v] do
I
if mark [w] = unvisited then
dfs(w)
end;
177
3
1
Example
9
2
6
5
8
7
4
10
DFS order : 1, 2, 4, 7, 10, 5, 3, 6, 8, 9
Depth-first spanning forest:
Dfnumber
1
7
1
5
2
2
3
6
8
4
7
5
6
4
10
forwars arc : ancestor ➙descendent (3,8)
back arc:
descendent ➙ ancestor (7, 1)
Cross arc:
all the other (7, 4) , (9,1)
178
3
tree arc
9 8
10
9
Fact: if (v, w) is a
(1)
tree/forward arc, dfnumber (v) dfnumber (w);
(2)
back/cross arc, dfnumber (v) dfnumber(w)
An application- test for acyclicity
Fact: a digraph is cyclic iff a back arc is encountered in any DFS.
v
Dfnumber (v) is the Smallest in
the cycle
.
.
w
V1
V2
How to start a back arc?
V3
In dfs, Include a dflNumber for each
Node enoutered. Also, keep
the current path in array.
V4
V5
179
dfl
0
1
2
3
4
Breadth-first search
go as broadly as possible
procedure bfs(v);
var Q: QUEUE of vertex
x,y:vertex
begin
markivi := visited;
print v // or anything //
MAKENULL(Q)
ENQUEUE(v,Q);
while not EMPTY(Q) do begin
x:= FRQNT(Q);
DEQUEUE(Q);
for each vertex y adjacent to do
time = 0(e)
if mark[y] = unvisited then begin
mark[y] := visited;
ENQUEUE(y,Q)
End
end;
2
BFS order:
1, 2, 5, 4, 6, 3, 7
1
5
3
bfnumber, bf spanning
forest
4
7
6
180
(Undirected) Graphs
DFS: very similar to DFS for digraphs.
7
5
2
8
10
4
1
9
3
6
DFS order: 1, 2, 4, 3, 6, 5, 7, 8, 10 , 9
dfs spanning
forest
1
if connected df
spanning tree
2
7
8
4
10
3
6
dfnumber(v):
Tree edge:
Back edge: (1,4), (4, 6), (2,5)
No cross edges!!!
181
5
9
BFS:
For the above graph, the BFS order is:
1,2,3,4,5,6,7,8,9,10
Applications of DFS and BFS:
1. Test for acyclicity
acyclic 1ff no back edges
2. Test for connectivity
connected 1ff only one tree In the DFS/BFS spanning forest
generally, each tree in the forest gives a connected component.
3. Biconnected components (next lecture)
182
Articulation points and biconnected components
Flight Map
1
2
3
5
4
8
7
6
Articulation vertex point : if removed the reaming graph becomes
disconnected
Def. A vertex v is called an articulation point or cutpoint if  vertices x,w
st. x≠ v, w≠x, x≠ and v is inevery path connected x and w.
Def. A connected graph is biconnected if it does not have any articulation
points.
183
Fact: The following are equivalent:
1. 0 is biconnected
2. Deletion of any single vertex fails to disconnect 0
3. Every pair of vertices are connected by two disjoint paths (n ≥ 3)
Def. A connected graph Is k-connected If deletion of any k-i vertices fails to
disconnect the graph
Def. A connected graph is k edge-connected if deletion of any k-i edges fails
to disconnect the graph
Biconnected component (or bicomponent): a maximal Induced biconnected
subgraph, e.g. the above graph has 5 bicomponents
2
5
4
5
6
3
5
8
1
5
3
7
184
Problem
Given a connected graph G, identify all its articulation points and
bicomponents.
-
Trivial algorithm: 0(n*e). We want 0(e)!
To identity the articulation points:
Step 1 Do a depth-first search of G.
Note:
1.
a single df spanning tree
3. only tree and back edges
185
Fact:
1. A leaf cannot be an articulation point
2. The root is an articulation point 1ff it has more than one child
3. Let v be an interior node other than the root. v Is an articulation point
iff some subtree of v has no back edge incident upon a proper ancestor of v.
Obs. Let w be any proper descendent of v and (w,x) be a back edge. x Is a
proper ancestor of v iff dfnumber (x) <dfnumber (v).
Def.
low(v) = the smallest dfnumber of v or of any node reachable by
following a back edge from some descendent of v (including v itself).
186
1
2
dfnumber
6
3
4
Dfnumber
1 2 3 4 5 6 7 8 9 10 11
Low
12115166911
5
7
8
9
10
11
dfnumber (v)
Low(v) = min
Dfnumber(x) s.t. (v,x) is back edge from any x
Low(y) for any child y of v
Step 2 Traverse the df spanning tree in postorder and compute low(v) for
all nodes v.
Note : if v is a leaf
Low(v) = min
dfnumber (v)
Dfnumber(x) s.t. (v,x) is back edge
Step 3
Identify articulation points by traversing the tree in postorder.
(This step can be In parallel with Step 2.)
187
v is an articulation point i for some child w of v
low (w) ≥ dfnumber(v)
Step 4
In Step 3, whenever an articulation point v Is found, delete the
subtree rooted at w and output the bicomponent given by the
subtree and V.
A
B
E
D
C
F
G
H
A
Df spanninf tree
And back edges
2
B
D
F
Time : O(e)
Matching in Graphs
188
1
3
4
C
5
E
6
G
7
8
H
Teachers
course
6
1
7
2
8
3
9
4
5
G = (V, E) is a graph
A matching in g is a set of edges with no two edges incident upon same
vertex
A matching is maximal if the number of its edges in the maximum.
A matching is complete/ perfect if every vertex in V is an endpoint of some
edge in the matching.
G is bipartite if V = V1 ∪ V2, V1 ∩V2 = ∅ each edge in E had one end in
V1 and the other end in V2 .
Problem:
Given a bipartite G, find a maximal matching in G.
Solution #1: (Brute force) Enumereate all possible matchings. Pick one that
the largest number of edges.
Time: O(n!) = O(nn)
Solution #2: Augmenting paths
189
Time: O(ne)
M = {(2,7),3(3,6),(4,9)}
Let M be a matching
A vertex V is matched if it is an endpoint if an edge in M, e.g. 2,3,4,6,7,9
Are matched
An augmenting path relative to M: a path connecting two unmatched
vertices in which alternate edges in the path are in M.
e.g.
1
6
3
5
9
4
10
190
P2
10
P1
Fact: if P is an augmenting path relative to M, then M ⊗ P is a bigger
matching.
e.g.
M ⊗ P {(2,7), (1,6), (3,9), (4,10)}
M ⊗ P = {(3,6), (2,7), (4,9), (5,10)}
(⊗ is also the Exclusive —Or on sets, i.e. A ⊗ B (A-B) ∪ (B-A) symmetric difference)
Fact: M is maximal iff there is no augmenting path relative to M.
Proof: “only if”: straight forward
“if”: i.e., if M is not maximal then there must be an augmenting path.
Let N be a matching s.t. |N| >| M|.
Then each connected component of (V,N ⊗ M) must be one of the
following:
equal
1. a simple cycle with edges alternating between N and M
1more
from M
2. an augmenting path relative to N
1 more
from N
3.
an augmenting path relative to M
4.
a path with equal number of edges from N and M
Since N ⊗ M has more edges from N than M...
191
Algorithm
M:= ⊗
Repeat
Find an augmenting path P relative to M;
M:= M ⊗ P
Until no more augmenting path exists
1
6
2
7
3
8
4
9
5
10
192
M
⊗
P
1
{(1,6)}
3
{(3,6),(1,8)}
2
{(2,8),(1,6), (3,9)}
{(2,8),(1,6),(3,9),(4,7)}
6
6
8
4
7
5
7
1
8
1
6
4
3
9
10
Algorithm to find an augmenting path relative to matching M
//G=(V E)
V=V1∪ V2 //
Build an augmenting graph level by level as follows:
level 0 :=
repeat
unmatched vertices In V1
level 2i+1 :=
new vertices that are adjacent to a vertex at
through an edge not in M; also add the edge;
level 2i +2:=
level 2i+2:= new vertices that are adjacent to a
vertex at level 21+1 through an edge in M;
also add the edge;
Stop when an unmatched vertex is added at an odd level or no more vertices
can be added (i.e. no augmenting path exists)
The path from the vertex to any vertex at level 0 is an augmenting path.
193
Example
V1
V2
1
6
7
2
8
3
9
4
5
10
11
12
Level
0
2
5
1
7
8
9
1
3
4
2
3
6
194
The process is very similar to BFS
Time: O(e) if adjacency lists are used
Internal Sorting
Internal:
data are stored in the main memory which is a RAM. Thus,
access to each data item takes constant time.
Data Item:
a record with one or more fields. One field contains the key of
the record.
‘≤’
linear-ordering on keys (compare ‘<)
Sorting:
arrange a sequence of records so that the keys form a
nondecreasing sequence.,
r1,r2,…rn
ri1-,ri2,…rin s.t.
ri1 .key ≤ri2.key ≤…≤ rin .key
195
Bubble Sort
Move the Hghter records to the top
for i := 1 to no do
for j:=n down to i+1 do
If A[j].key < a[j-1].key then
swap (A[j], A[j-1])
In place Time: 0(n2) Bad sequence: descending
Insertion Sort
Insert A[i] into A[1], A[2], ..., A[i-1] at its rightful position
A[0].key := - ∞
For i:= 2 to n do begin
j:= I;
while A[j] A[j-1] to begin
swap (A[j], A[j-1]);
inplace
j:=j-1
end
end;
Time: 0(n2) at descending sequence
196
Selection Sort
Select the smaHest record and place it at its tightful position.
for i := i to n-i do
select the smallest among A[i],…A[n]
swap it with A[i]
Time: 0(n2) In place
Better than Bubble when reconi is large
Shell Sort (diminishing-increment)
Incr = 6
Incr =3
Time: O(n3/2) in place for some incr sequence
197
Heap Sort
Q:
PRIORITY QUEUE
for i:=1 to n do
INSERT (A[i],Q);
for i := n down to 1 do
A[i] := DELETEMIN(Q);
Time: O(nlogn)
in place if Q is implemented using array A[l ..n]
Details in [AHU]
Quick Sort
If A[i..j] contains two distinct keys then
find the larger of the first two distinct
keys, v (called pivot);
arrange A[i..j] so that k, i+1 ≤j;
A[[],…., A[k] < v and A[k+1],…,A[j] ≤;
quicksort (i,k-1);
quicksort (k,j)
198
A[i],…A[k] < v, and
A[k+1],…,A[j] ≥ v
Example
5
7
2
1
4
3
9
5
1
7
Partition v = 7
5
5
1
1 2
2
1
1
4
4
3
3
5
Partition V =5
3
3
1 2
1
1 2
1
4
4
5
3
1 2
1 2
3
4
3
4
V= 2
1
1
1
3
2
4
9
7
7
5
5
4
3
7
5
v =4
1 2
9
7
7
partition v = 9
Partition v = 3
1
5
199
7
7
7
9
9
Worst time complexity: 0(n2)
pivot (i,j) - 0(j-i+1)
partition (i,j,pivot) – O(j-i+1)
T(n) = 0(n) + T(n-1)
=...
= 0(n
not in place! stack space can be made to O(logn)
Average time complexity
Assumptions:
1.
all orderings are equally likely
2.
the keys are distinct
Pr(lst group s of size i)
= Pr(A[1] is the i ÷1 St smallest and A[1] Is the pivot)
+ Pr(A[2] Is the i+1 st smallest and A[2] is the pivot)
= 1/n i/n-1 + 1/n i/n-1 = 2i/n(n-1)
200
Tavg(1) = C0
n-1
Tavg(n)
∑
≤
2i
I =1
[Tavg(i) + tavg(n-1)] +cn
n(n-1)
n-1
≤2
n-1
∑
[Tavg(i) +t cn
I =1
Suppose tavg(i) ≤ k I logi for some constant k 2 ≤ I < n
n-1
Tavg(n) ≤ 2
n-1
∑
k i logi +cn
I =1
n/2
=
2k
n-1
∑
n-1
i logi
∑
+
i =1
2k
∑
+ cn
i=n/2+1
n/2
=
I logn
n-1
i (logn-1)
+
∑
201
i logn
+ cn
n-1
i =1
i=n/2+1
≤ knlogn – kn/4 – kn/2(n-1) +1
≤ knlogn, if k is large enough
 Tavg(n) = 0(nlogn)
2-way merge sort
•
divide-and-conquer
•
can be used for external sorting
•
can be generalized to rn-way
Algorithm Msort (A[l ..n]);
if n> 1 then begin
m:=[n/2];
Msort (A[1...m]);
Msort (A[m+1..n]);
Merge (A[1...m], A[m+l …n], B[1...n]);
A[1...n] := B[1...n];
end;
Let k=2[logn]
(i.e. k is the smallest power of 2 that is greater than or equal to n)
T(n) ≤ T(k) = T(k/2) + T (k/2) + ck
= 2 T(k(2) + ck
=4 T(k14)+ck+2 ck/2
if Merge
takes 0(n)
=8 T(k/8) + 3ck
…
= ck log2 k + k O(1) = O(nlogn)
202
The nonrecursive version
Logn
Recursive
version
logn
NOTE : Merging order may be different in nonrecursive version.
Type
afile: array[1…max] of elementtype;
203
Merging two sorted lists
Ι
M m+1
i
ι
n
j
n
k
procedure merge (var X,Z :afile; l, m, n : integer);
//Merge X[l…m] and X[m +1..n] into Z[l..n] //
var i,j,k: integer;
begin
i:=1; j := m+1; k:=l
while I ≤ m and j ≤ n do begin
if x[i].key ≤ x[j].key then
begin
Z[k]:=x[i]; i:= i+1
End
Else begin
Z[k] :=X[j]; j:=j+1
End
K := K +1
End;
If i > m then
204
Union of ordered
sets represented
by sorted lists!
Z[k…n] : = x[j…n] // move the reaming items//
Else
Z[k…n] : = x[j…m]
End;
Time: O(n-1)
l
Procedure
onepass (var X,Y:afile; n;l:integer);
//this procedure performs one pass of the merege sort. It merges adjacent
pairs of segments of length l from list X to list Y. n= |X| //
var i : integer;
begin
i =1;
While i ≤ n -2l-1 do begin
Merge (X, Y, i,l-1, i+2l -1);
i:= I +2l;
end;
// merge remaining segments of length < 2 I //
if (i+l-1) < n then
Merge (X,Y,i+l-1,n)
Else
Y[i…n] := X[i…n]
205
End;
Time : O(n)
procedure Msort(var X:afile; n:integer);
var l:integer
Y:afile;
Begin
//l is the size of the segments currently being merged //
l :=1;
while l < n do begin
Onepass(X,Y,n,I);
l := 2*1;
Onepass(Y,X,n,l);
l =2*l;
end;
end;
At most [1og2n] +1 passes.
Each pass takes 0(n) time.
Total: O(nlogn)
206
Example
3
5
6
4 5
93
7 2
8 4
6 1 X
1=1
3 5
4 6
5
9
3
7
2
8
4
6
1Y
1=2
3
4
4
5
5
6
3
5
7
9
2
4
6
8
1 X
1=4
3
3
5
6
7
9
1
2
4
6
8
Y
1 =8
1
2
3
3
4
4
5
207
5
6
6
7
8
9
Obs: The list X and Y are scanned sequentially from left to right
[log2n] + 1 times
Bin Sorting
Is Ω(nlogn) the lower bound for sorting a elements?
Yes, if we don’t make any assumption about the keytype and only use
comparisons such as key l ≤ key2.
What if we know 1 ≤ key ≤ n and the a elements have distinct keys?
To sort such n elements,
for i := 1 to n do
B[A[i].key] := a[i];
0(n)
or
for i:= to n do
while A[i].key ≠ i do
swap (A[i],A[A[i].key]);
0(n)
208
Example
Sorting records that have a small number of distinct keys:
n records, O(logn)
distinct keys
Can we do better than O(nlogn)?
An algorithm using modified 2 - 3 tree: (AVL also okay)
5
6
4
2
7
4
9
5
6
209
7
11
9
11
.
.
.
size of tree: O(logn)
each insert: O(loglogn)
Total time: O(nloglogn)
Bin Sorting
Key = 1..m (any finite and discrete type)
1
.
2
.
.
.
m
Bin table B
Procedure binsort;
Var
i:= integer, v: keytype;
begin
for i = 1 to n do
O(n)
INSERT(A[i], END(B[A[i].key]), B[A[i],key]);
210
O(m)
End;
For v : = 2 to m do
CONCAT(B[1],B[v])
Bin sorting when m = nk for some k
Example
k=2
keytype=0...n2 -1
Step 1:
Place each integer i into bin i mod n Append Ito the end of the
list for bin i mod n.
Step 2: Concatenate the lists.
Step 3: Place each integer i into bin [i/n].
Step 4: Concatenate the lists.
Each step -- 0(n)
Total time — 0(n)
211
n=10
Given: 45, 36, 21, 64, 60, 33, 12, 27, 30, 25
ImodlO=
BIN
CONTENTS
0
1
2
3
4
5
6
7
8
9
60,30
21
12
33
64
45,25
36
27
New list: 60, 30, 21, 12, 33, 64, 45, 25, 36, 27
BIN
LW10J=
0
1
2
3
4
5
6
7
8
9
CONTENTS
12
21, 25, 27
30,33,36
45
60,64
212
Radix Sort
type
keytype = record
f1: t1;
f2:t2;
.
.
finite, discrete
fk: tk
end;
keyl = (a1, a2,…ak)
key2 = (b1, b2,…,bk)
keyl <key2 iff
1. a1 <b1, or
2. a1 = b1 and a2< b2, or
k. a1 = b1 ,...,ak-l = bk-1, and ak < bk
i.e.  i, 0 ≤ i <k s.t. aj = bj i ≤j ≤i and
ai+1 < bi+1
e.g. abc <aca (called lexicographic order)
213
Var
Bi:array[ti] of linked listtype;
Procedure radixsort;
// binsort list A, first on fk, concatenate the bins in Bk, binsort of fk-1,and so on //
begin
for i := k down to 1 do begin
for each value V of type ti do
make Bi[v] empty;
for each record r on list A do
move r from A on to the end
of bin Bi[r.fi];
// binssort on fi //
foe each value V of type ti, from lowest to highest do concatenate
Bi[v] onto the end of A
end
end;
k
k
i=1
i =1
Time :  o(|ti| +n) = o(kn+  |ti|)
214
Example
A = hact, fact, sact, camp, duck, kuck, codd, less, more
D
E
K
P
S
T
codd
more
sack, duck, kuck
camp
less
hact, fact
C
D
M
V
S
sack, duck, kuck, hact, fact
codd
camp
more
less
A
E
O
U
sack, hact, fact, camp
less
codd, more
duck, kuck
C
D
F
H
K
L
M
S
camp, codd
duck
fact
hact
kuck
less
more
sack
215
Odd-even merge sort
(Useful when you have a parallel computer)
Algorithm Odd-even-merge-sort (a0,a1,…a2n-1)
1. Split the list a0,a1,…a2n-1 into two lists a0,a1,…an-1 and a0,an+1,…a2n-1
2. Odd-even-merge-sort (a0,a1,…an-1)
3. Odd-even-merge-sort (an,an+1,…a2n-1)
4. Odd-even-merge (a0,a1,…an-1, an,an+1,…a2n-1)
Algorithm Odd-even-merge (a0,a1,…an-1, b0,b1,…bn-1)
1. c0,c1…cn-1 := Odd-even-merge (a0,a2,…an-2, b0,b2,…bn-2)
2. d0d1…dn-1 := Odd-even-merge (a1,a3,…an-1, b1,b3,…bn-1)
3. For all i > 0, compare c1 and di-1 and interchange i necessary
4. Return; c0 c1 d0 c2 d1 c3 d2 … cn-1 dn-2 dn-1
216
Example
Odd-even-merge
(4,5,8,11,20,25;
2,9,10,27,30,31)
Odd-even-merge
returns
(4,8,20;2,10,30)
2,4,8,10,20,30
Odd-even-merge
returns
(5,11 ,25;9,27,31)
5,9,11,28,27,31
c:
2
4
8
10
20
30
d:
5
9
11
25
27
31
245 89
10 11 20 25 27 30 31
Sequential time complexity:
Odd-even-merge – T1(n)
T1(n) = 2T1(n/2) + cn
.
= O(ntogn)
Odd-even-merge-sort – T2(n)
T2(n) = 2T2(n/2) + c1nlogn
.
= O(nlog2n)
In parallel, Odd-even-merge - O(logn) time
217
….sort - O(log2n)
Odd-even-mereg (a0a1a2a3, a4 a5 a6 a7)
a0
a1
a2
a3 ,
a4
a5
a6
a7
a0
a2
a4
a6 ,
a1
a3
a5
a7
a0
a4
a2
a6 ,
a1
a5
a3
a7
b0
b1
b2
b3,
b4
b5
b6
b7
c0
c1
c2
c3 ,
c4
c5
c6
c7
c0
c1
c4
c2 ,
c5
c3
c6
c7
218
d0
d1
d2
d3,
d4
d5
d6
d7
Lower bound for sorting
Defintion:
Let B be a problem and f(n) a function. B requires Ω (f(n)) time if every
algorithm for B has time complexity Ω (f(n)) (i.e., the running time is at
least f(n) In the worst case for inputs of length n).
f(n) is a time lower bound for B.
Theorem :
Sorting by comparisons requires Ω (nlogn) time.
(in fact, Ω (nlogn) comparisons)
Assumption: Only operations on keys are comparison of key values.
Key 1 < key 2 ?
no
yes
Without loss of generality, assume the keys are distinct.
decision trees
219
Let P be any sorting algorithm. Denote the input by:
A[1..n]
:
a1,a2,…,an
Define a binary tree as follows
A[i1,] < a[j1]?
yes
no
A[i2,] < a[j2]?
y
A[i3,] < a[j3]?
no
y
An outcome i.e.
A sort list
ar1,ar2,….,am
220
no
called the decision tree for P on size n
Decision tree for bubble sort with n =3
1
2
3
A
For I : = 1 to 2 do
For j :=3 down to I +1 do
If A[j]< A[j-1] then swap (A[j],A[j-1])
A[1..3] = a b c
Abc
A[3] < a[2] ?
y
y
Abc
Abc
A[2] < a[1] ?
y
A[2] < a[3] ?
n
y
y
Abc
Abc
Abc
Abc
A[3] < a[2] ?
A[3] < a[2] ?
A[3] < a[2] ?
A[3] < a[2] ?
y
cba
n
cab
n
acb
y
bca
221
n
bac
n
abc
Fact:
For any sorting algorithm A, the decision tree for a must have at
least n! leaves.
Proof
There are ni outcomes when A sorts n elements.
Fact
The depth of the decision tree must be at least log
Proof
Let depth = d
n! ≤ 2
d ≥[1og2(n!)]
Corollary
A requires at least 1og2(n!) comparisons in the worst case.
.
n! = (n/e) e = 2.71 83
1og2(n!) = n1og2(n/e)
222
n1og2(n/e) = Ω (nlogn)


Sorting requires c compansors.
Sorting requires Ω (nlogn) time.
Average Time complexity for sorting
= avg depth of leaves in decision tree
Claim:
Among the n! leaves, at least half of them have depth n1og 2
(n!).
Proof:
The maximum number of leaves with depth ≤ n1og2(n!)-1 is
n1og2(n!/2)
2
n1og2(n!/2)
=2
= n!/2
 on average, sorting requires
Ω 1og2(n!)
2
= Ω(nlogn) time
223
Problem
Given a1,a2,…,an, s.t. a1 < a2 <…< an and x
Find i s.t. a1 =x
Binary search:
O(logn) time
Fact
searching requires Ω (logn) time
Proof
any of a1,a2,…,an could be x
 there are at least n + 1 outcomes when we search x in
a1,…,an
a decision tree must have depth log2(n+1)
Given a1,a2,…,an find the smallest element
Problem
ai < aj
ai < aj
ai < aj
yes
n-1 elements must
all have lost some
comparison!
224
no
External sorting
Assumption :
The number of data items to be sorted is too large.
The data items (records) are sorted on external storage devices in the form of
(sequential) files.
External storage device:
Magnetic tape
read/write
head
Operation:
Read (B)
Write (B)
f.forward
rewind
...
BLOCK i
BLOCK i +1
225
. ..
inter-block gap
Magnetic Disk
A track
R/W head
A sector ( block)
To access a sector:
1.
locate the current track by shifting the R/W head
2.
wait until the correct sector arrives
the time needed to access a block:
seek time + actual R/W time
>>> main memory access time
file : a sequence of block a fixed number of records
226
 in external sorting, the dominating factor is the number of block accesses
 it is desirable to scan a file beginning to end
The model
File1
file2
file3
CPU
.
.
.
.
MAIN MEMORY
.
.
.
.
.
.
.
.
C BLOCKs
Objective: sorting with minimum number of passes through the file (thus,
minimum number of block accesses)
227
Bubble, insertion, …., Quick, heap sorts:
require at least O(n) passes.
2-way merge sort: only require [log2n] passes!
ASSUME THAT FILES ARE STORED
ON DISKS. THUS, SEEK TIME IS
THE “SAME” FOR ALL BLOCKS.
EXAMPLE
Sort file F = A1,A2,…A2100
A block = 100 records
Working main memory space
= 3 blocks ( used as buffers)
228
Step 1:
internally sort three blocks (300 records at time.
Store the resulting file on disk.
Run1 run2
1-300
Step 2:
301-600
run3
run4
run5
run6 run7
601-900
901-1200 1201-1500 1501-1800 1801-2100
partition the main memory into three blocks.
Two are used as input buffers and the third is used as an output
buffer.
Merge runs 1 and 2 .
Alogrithm
Merge (R1,R2)
Read a block from R1;
Read a block from R2;
Merge the records in the input buffers and store the result in the output
buffer;
If the output buffer gets full, write the contents on to the disk and clear the
buffer;
If an output buffer gets empty, read a block from the same run.
229
Merge runs 3 and 4, then 5 and 6 , then copy run 7
The result of this is a file of 4 runs.
Merge these runs and produce a file of two runs.
Merge the two runs to obtain a single run (i.e., a sorted file).
F
F1
F2
F3
230
Notes:
1.
If the number of initial runs is m, then [log2m] passes suffice.
2.
if the device is tape, then we need four tapes.
Tape 1:
Tape 2:
Tape 3:
Tape 4:
Run 1
Run 3 Run 5 Run 7
Run 2
Run 4 Run 6
Run 1
Run 3
Run 2
Run 4
Run 1
Tape 1:
Tape 2:
Run 2
Run 1
Tape 3:
Tape 4:
3.
Temp files can be discarded after being used.
231
4.
k-way merge
3 way
Generally, K-way merge sort requires
[logkm] passes
= [log2m/log2k]
k +1 buffers; more comparison (k-1/record) in each pass
for tapes, k-way merge requires 2k tapes.
232
General algorithm design techniques
Divide-and-conquer
Top-down, recursive
e.g.
merge sort
quick sort
Dynamic programming
Bottom –up
longest common
subsequence
Greedy
Brute force
shortest paths
minimum-cost spanning tree
Back tracking
rat-in-maze
Divide-and-conquer
To solve problem A:
If A is small enough
Then solve it directly
Else
Break A into smaller problems
A1,A2,..Ak;
Solve Ai for each i = 1, 2, …, k;
Combine the solutions for A1,..Ak
To obtain the solution for A
233
smaller instances
Of the same problem
Example:
Towers of Hanoi
A
B
C
Algorithm Move (n,A,B)
//move n disk from A to B //
if n =1 then move the disk B
else begin
move (n-1, A, C);
move (1,A, B);
move (n-1, C, B)
end
T(n) =
C1
2T(n-1) + c2 if n =1
otherwise
234
= O(2n)
Example
Given n integers, find both the maximum and minimum
Algorithm maximum (A[1..n], max, min)
If n =1 then
Max := min := a[1]
Else if n =2 then
ifA[1] < A[2] then
max : = A[2]
min : = A[1]
else
max : = A[1]
min := A[2]
else
// n > 2//
maximin (A[1..n/2], max1, min1);
maximin (A[n/2 +1..n], max2, min2);
if max1 < max2 then
max : = max2
else
max : = max1
if min1 ≤ min2 then
min : = min1
else min : = min2
235
C (n) =
n
2C( 2 ) +2 (comparisons)
1
if n =2
C(n) = 3/2n -2
(by induction)
Dynamic programming
There are situations where:
(i)
There is no way to divide a problem into a small number of
Subproblems.
(ii) The subproblems overlap each other (too much redundancy if d
divide-and-conquer is used).
(iii)
The total number of subproblems to tackle is not large, i.e.
polynomial (i.e. nk. Usually k =2,3).
236
Dynamic programming approach:
Systematically solve all the subproblems, with the smallest ones first.
Keep track of the solutions to the solved subproblems by means of a
Table.
Solutions to larger subproblems are found by combining
solutions to smaller subproblems.
Example
Longest common subsequence problem
(LCS)
sequence : x =
a1a2…an
subsequence of x: a sequence obtained from x by deleting some characters
ab
cab
ca
bba b
LCS Problem:
given x = a1a2…an
y = b1b2…bm
find the length of an LCS of x and y
237
Previous solution ( using sets);
O(plogn) time
Where p = the number of paris of positions,
One from each sequence, that have
The same character
In the worst case, p = O(nm)
Time : O(nmlogn)
Dynamic programming solution
Given x = a1a2…an and y = b1b2…bm
Define an (n +1) X (m +1) matrix L
L[i,j] = the length of an LCS of a1a2…ai and b1b2…bi
For all 0 ≤ i ≤ n, 0 ≤ j ≤ m
Note : L[0,j] = L[i,0] = 0
0 ≤ i ≤ n, 0 ≤ j ≤ m
L[n,m] is the length of the LCS of x and y
Each L[i,j]
A subproblem
238
L[0,j] = 0,
0≤j≤m
L[0,j] = 0,
0≤i≤n
L[i-1,j]
L[i-j-1]
i-1,j -1] +1 if a1 = bj
L[i,j] = max
0
otherwise
1 ≤ I ≤ n, 1 ≤ j ≤ m
a1… ai-1 ai
b1… bj-1 bj
0
1
j -1 j
239
0
0
0 0
0
0
0
0
0
1
i -1
i
n
solution
Algorithm LCS
// evaluate matrix L row by row , with row 0 first //
240
for j : = 0 to m do
L[0,j] : = 0;
For i := 1 to n do
L[i,0] : = 0;
For i := 1 to n do
for j : = 0 to m do
if ai =bj then
temp : = L(i-1,j-1] +1
else temp :=0
L[i,j] : = max (L(i-1,j], L[i,j-1], temp
End;
Writeln (L[n,m])
Time = O(nm)
Space = O(min(n,m))
Recursive solution
(divide – and-conquer)
Algorithm LCS (n,m)
241
If n = 0 or m = o then
LCS : = 0
Else begin
I1:=LCS (n-1,m);
I2:=LCS(n,m-1);
if an = bm then
I3 : LCS (n-1, m-1) +1
Else
I3: =0;
LCS : = max (I1,I2,I3)
End
End;
T(n,m) = T(n-1,m) + T(n,m-1) +T(n-1,m-1) +C
= O(3n+m)
Dynamic programming example 2
World Series Odds
Problem:
Teams A and B play a match. Whoever wins n games first wins
the match.
242
Assumption: A and B are equally competent, i.e., each has a 50% chance of
winning a particular game.
P(i,j):
The probability that if A needs i games to win (i.e., A has won
n-i games) and B needs j games to win, that A will eventually
win the game, 0 ≤ i, j ≤ n.
We want to compute P(s,t) for some particular 0 ≤ s.t. ≤ n.
P(0,j) = 1
1≤j≤n
P(i,0) = 0
1≤i≤n
P(i,j) = P(i-1, j) + P (I, j-1)
0 ≤ i, j ≤ n
243
2
0
j -1 j
0 0
0
0
0
1
0
0
i -1
0
1
1 1
1
Order of evaluation:
1.
row by row/column by column
2.
diagonal
Greedy Algorithm
Setting :
given n objects a1,a2,…an,
Each with a weight ( or cost) w(ai)
244
i
0
We want to select a subset of objects
a11,a12,…akm, subject to some
constraint, such that
m
∑ w (aij)
j=1
is the minimum
Example
Coin
Changing A1 = {c1,c2,…cn} is a set of distinct coin types
c1 > c2 > cn ≥ 1
How do we make up an exact amount using a minimum
Total number of coins?
If cn =1 then greedy algorithms can be used.
Algorithm coinchange (x);
i = 1;
while x ≠ 0 do begin
if c1 ≤ x then begin
//selsect the largest coin whose
value is ≤ x //
writeln(c1);
245
x : = x –c1
else
i:= i +1
end
e.g.
c1 = 25¢ c2 = 10¢ c3 =5¢ c4 = 1¢
x =73¢
change :
c1, c1, c2, c2, c4, c4, c4
Notes:
1. The algorithm doesn’t necessarily generate change with minimum
total number of coins.
e.g., c1 = 5, c2 = 4, c3 = 1
x =8
2.
Does so if A1 = { kn-1, kn-2,…k0 }
Matching.
246
6
1
7
2
8
3
4
9
5
10
1.
Start with m 0
2.
Find an augmenting path P relative to M and replace m by MP
3.
Repeat (2) until no further augmenting path exists, and then M is
maximal.
1
2
3
4
5
6
7
8
9
10
247
P = {1,6}
M = 0  {1,6} = {1,6}
2
3
7
8
4
5
9
10
6
1
7
P = {(2,6), (1,6),(1,7)}
M = {1,6}  P = {(2,6), (1,7)}
3
4
6
7
5
248
P = {(3,7), (1,7),(1,6),(6,2),(2,9)}
M = {(2,6),(1,7)}  P =
= {(3,7),(1,6),(2,9)}
4
5
6
8
9
10
249
1
2
3
3
1
7
3
3
Doesn’t not work
3
P = {(4,9), (2,9), (1,6),(1,8)}
M = {(3,7),(1,6),(2,9)}  P
= {(4,9),(3,2),(1,8)}
2
6
5
9
10
250
P = {(2,9),(4,9),(4,10)}
M = {(4,9), (1,8),(3,7)}  P =
= {(2,9),(1,8),(4,10),(3,7)}
5
6
251
P = {(5,6)}
M = {(2,9),(1,8), (4,10),(3,7)}  P=
= {(2,9),(1,8),(4,10),(3,7),(5,6)}
252