Download CIS 2520 Data Structures: Review Linked list: Ordered Linked List:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Linked list wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Linked list:
CIS 2520
Data Structures: Review
A linked list represents a sequence:
Every node but one has a predecessor, and
every node but one has a successor.
tail
head
pt
……..
null
/* recursive definition */
typedef struct node {
int data; // whatever useful in the node
struct node* next; // link to the next node
} node;
review
Recursive versions of print_list
Ordered Linked List:
/* print list recursively (from head to tail)*/
void print_list_1(node* p){
if (p){
printf(“data: %d \n”, p->data);
printf_list_1(p->next);
}
Insert nodes in their sorted position.
typedef struct node{
int data;
struct node* next;
} node;
}
head
/* print list recursively (from tail back to head)*/
void print_list_2(node* p){
if (p){
printf_list_2(p->next);
printf(“data: %d \n”, p->data);
}
}
d1
d3
3
dn
null
……..
review
Question: how to insert dk into the list?
Version 1:
4
while (curr && data > curr->data){
prev = curr;
head
curr = curr->next;
*hp
dk
null
}
d1
Case 2: dk should be inserted in front of the list
head
dk <= d1
d2
where d1 <= d2 <= d3 <= … <= dn
review
Case 1: head == NULL
2
d2
d3
……..
dn
null
Version 2:
dk
d1
d2
……..
dn
null
p
dummy
*hp
Case 3: dk should be inserted after a node
head
while (p->next && data > p->next ->data)
d1
d1
d2 <= dk <= d3
d2
d3
……..
d2
d3
dn
null
p = p->next;
……..
dn
null
dk
review
5
review
6
1
Doubly Linked List:
Binary search function:
A list which can be traversed either
forward or backward:
binary search is used to find a target value in
a sorted table.
we start by comparing the target value with
the table’s middle element.
since the table is sorted, so if the target is
larger, we can ignore all values smaller than
the middle element, and vice versa.
We stop when we’ve found the target or no
values left to search.
typedef struct node{
int data;
struct node* next;
struct node* prev;
} node;
head
…
…
null
null
review
7
review
8
int* bsearch(int target, int *table, int n){
int *min = table;
int *max = table + (n – 1);
int *mid;
int k = n/2;
2
while (min < max) {
mid = min + k;
if (target == *mid) return mid;
else if (target > *mid)
min = mid + 1;
else
max = mid - 1;
k /= 2;
}
return NULL;
5
8
11 15 17 22 25 34 36 43 57 59 66 72
min
max
mid
target = 36
2
5
8
11 15 17 22 25 34 36 43 57 59 66 72
2
5
8
11 15 17 22 25 34 36 43 57 59 66 72
min
}
mid
max
min mid max
review
9
10
Three Characteristics of Recursion
Useful recursion
To be useful the recursion must
terminate, so there must be at least
one non-recursive case
such as: 0!
as well as recursive cases.
such as: n * (n – 1)!
review
review
Calls itself recursively
Has some terminating condition
Moves “closer” to the terminating
condition.
11
review
12
2
Big-Oh Rules
Algorithm Analysis
If is f(n) a polynomial of degree d, then f(n) is
O(nd ), i.e.,
1.
Drop lower-order terms
2.
Drop constant factors
n
n
Say “ 2n is O(n)” instead of “2n is
n
O(n2)”
Use the simplest expression of the class
n
n
Say “ 3n + 5 is O(n)” instead of “3n + 5 is O(3n)”
review
n
O
review
•
where K is a constant
n
O (2 N)
n
O (N!)
O (N N)
then f + h
is O(max(g, r))
If f is a polynomial of degree d,
then f is O( n d)
eg 10n 4 + 5n 6 + n 2 is O ( n6 )
review
15
Properties of the O notation
16
An abstract data type (ADT) is an abstraction of
a data structure
ADT refers to a way of packaging some
intermediate-level data structures and their
operations into a useful collection whose
properties have been carefully studied.
An ADT specifies:
n
n
logbn is O(logdn) ∀ b, d > 1
review
review
Abstract Data Types (ADTs)
f is O(g) is transitive
n If f is O(g) and g is O(h) then f is O(h)
Product of upper bounds is upper bound for
the product
n If f is O(g) and h is O(r) then f*h is
O(g*r)
All logarithms grow at the same rate
n
f is O(g) and h is O(r)
eg an 4 + log n is O(max(n 4 , log n)) à O( n 4)
Ø Polynomial’s growth rate is determined by leading
term
Unreasonable algorithms have exponential
factors
n
∀ k > 0, kf is O( f)
Ø Fastest growing term dominates a sum
n
n
14
Ø Constant factors may be ignored
O (Log N)
O (N)
(N K )
Since constant factors and lower-order terms are
eventually dropped anyhow, we can disregard them
when counting primitive operations
Properties of the O notation
Reasonable algorithms have polynomial factors
n
We determine that algorithm arrayMax executes at most
6n − 2 primitive operations
We say that algorithm arrayMax “runs in O(n) time”
13
Reasonable vs. Unreasonable
n
We find the worst -case number of primitive operations
executed as a function of the input size
We express this function with big-Oh notation
Example:
Use the smallest possible class of functions
n
The analysis of an algorithm determines the running
time in big-Oh notation
To perform the analysis
n
17
Data stored
Operations on the data (clean, simple interface)
Error conditions associated with operations
review
18
3
The Stack and Queue ADT
Binary Tree
The Stack ADT stores arbitrary elements
Insertions and deletions follow the LIFO scheme
Main stack operations:
n push(element): inserts an element
n element pop(): removes and returns the last inserted
element
A binary tree is a tree with the
following properties:
n
n
n
n
enqueue(element): inserts an element at the end of the queue
element dequeue(): removes and returns the element at the front
of the queue
review
A
B
C
D
E
F
H
G
I
review
20
A heap is a binary tree
storing keys at its internal
nodes and satisfying the
following properties:
Heap-Order: for every
internal node v other than
the root,
key(v) ≥ key(parent(v))
Complete Binary Tree: let h
be the height of the heap
n
n
The last node of a heap
is the rightmost internal
node of depth h − 1
2
5
9
6
7
w for i = 0, … , h − 1, there are
2 i nodes of depth i
w at depth h − 1, the internal
nodes are to the left of the
external nodes
review
21
Upheap
last node
review
22
Downheap
After the insertion of a new key k, the heap-order property may be
violated
Algorithm upheap restores the heap-order property by swapping k
along an upward path from the insertion node
Upheap terminates when the key k reaches the root or a node
whose parent has a key smaller than or equal to k
Since a heap has height O(log n), upheap runs in O(log n) time
2
After replacing the root key with the key k of the last node, the
heap-order property may be violated
Algorithm downheap restores the heap-order property by
swapping key k along a downward path from the root
Downheap terminates when key k reaches a leaf or a node whose
children have keys greater than or equal to k
Since a heap has height O(log n), downheap runs in O(log n) time
7
1
5
1
7
n
What is a heap
function inOrder(v)
if (isInternal (v)){
inOrder (leftChild (v));}
visit(v);
if (isInternal (v)){
inOrder (rightChild (v));}
function postOrder (v)
if (isInternal (v)){
inOrder (leftChild (v));}
if (isInternal (v)){
inOrder (rightChild (v));}
visit(v);
9
n
a tree consisting of a single node,
or
a tree whose root has an ordered
pair of children, each of which is a
binary tree
19
Binary Tree Traversal
function preOrder(v)
visit(v);
if (isInternal (v)){
inOrder (leftChild (v));}
if (isInternal (v)){
inOrder (rightChild (v));}
n
arithmetic expressions
decision processes
searching
n
We call the children of an internal
node left child and right child
Alternative recursive definition: a
binary tree is either
The Queue ADT stores arbitrary elements
Insertions and deletions follow the FIFO scheme
Main queue operations:
n
Applications:
Each internal node has two
children
The children of a node are an
ordered pair
z
5
6
9
review
5
2
7
z
5
6
7
w
23
6
w
9
6
9
review
24
4
AVL Tree
Binary Search Tree
A binary search tree is a
binary tree storing keys
(or key -element pairs)
at its internal nodes and
satisfying the following
property:
n
An inorder traversal of a
binary search trees
visits the keys in
increasing order
Let v be a tree node,
and L, R be subtrees
such that L is the left
subtree of v and R is the
right subtree of v. We
have
keys(L) ≤ key(v) ≤ keys(R)
6
2
9
1
4
8
AVL trees are
balanced.
An AVL Tree is a
binary search tree
such that for every
internal node v of T,
the heights of the
children of v can
differ by at most 1.
4
44
2
17
25
review
1
48
4
62
x (c)
50
T0
54
z
y
62
5
x
T0
48
78
54
88
T2
T1
T3
Make a new tree which is balanced and put the 7 parts
from the old tree into the new tree so that the
numbering is still correct when we do an in-ordertraversal of the new tree.
This works regardless of how the tree is originally
unbalanced.
review
28
Now cut x,y, and z in that order (child,parent,grandparent)
and place them in their inorder rank in the array.
a
44
T1
62
b T2
c
78
T3
1
2
3
4
5
6
7
•Now we can re-link these subtrees to the main tree.
6
•Link in rank 4 (b) where the subtree’s root formerly
78
48
26
50
T0
2 z (a)
y (b)
1
Cut/Link Restructure Algorithm
Number the 7 parts by doing an in-order-traversal. (note that
x,y, and z are now renamed based upon their order within the
traversal)
3
62
An example of an AVL tree where the
heights are shown next to the nodes:
17
27
Cut/Link Restructure Algorithm
17
88
review
44
If an insertion(w) causes T to become unbalanced, we
travel up the tree from the newly created node until we
find the first node x such that its grandparent z is
unbalanced node.
If a remove(w) can cause T to become unbalanced, let
z be the first unbalanced node encountered while
traveling up the tree from w. Also, let y be the child of
z with the larger height, and let x be the child of y with
the larger height.
To rebalance the subtree rooted at z, we must perform
a restructuring
44
1
50
Cut/Link Restructure Algorithm
rebalancing
1
3
2
32
External nodes do not
store items (NULL’s )
review
78
1
88
7
4
T2
b
62
T1
review
T3
29
review
30
5
(2,4) Tree
Insertion
We insert a new item (k, o) at the parent v of the leaf reached by
searching for k
A (2,4) tree (also called 2-4 tree or 2-3-4 tree) is a multi-way
search with the following properties
n
Node-Size Property: every internal node has at most four children
Depth Property: all the external nodes have the same depth
n
n
n
Example: inserting key 30 causes an overflow
Depending on the number of children, an internal node of a
(2,4) tree is called a 2-node, 3-node or 4-node
10
2
8
15
We preserve the depth property but
We may cause an overflow (i.e., node v may become a 5-node)
10
24
12
18
27
2 8
12
2 8
12
15
24
v
18
27
32
35
32
10
review
31
15
24
v
18
27 30
32
review
35
32
Huffman Encoding Trie
Overflow and Split
Greedy Approach
We handle an overflow at a 5-node v with a split operation:
let v1 … v5 be the children of v and k1 … k4 be the keys of v
node v is replaced nodes v' and v"
n
n
n
Sort characters by frequency
n
Form two lowest weight nodes into a sub-tree
w v' is a 3-node with keys k1 k2 and children v 1 v 2 v 3
w v" is a 2-node with key k4 and children v 4 v 5
w Sub-tree weight = sum of weights of nodes
key k 3 is inserted into the parent u of v (a new root may be created)
n
The overflow may propagate to the parent node u
n
u
u
v
12
18
v'
27 30 32 35
12
18
v"
27 30
35
v1 v2 v3 v4
v1 v2 v3 v4 v5
review
Example
c
d
r
5
2
1
1
2
d
1
a
5
b
2
c
A graph is a pair (V, E), where
2
4
100
d
b
r
101
110
111
r
2
n
6
2
r
2
a
5
c
a
5
review
c
b
Σ v deg(v) = 2m
r
Proof: each endpoint is counted twice
n
number of vertices
m
number of edges
deg(v) degree of vertex v
In an undirected graph with no self-loops and no
multiple edges
m ≤ n (n − 1)/2
Proof: each vertex has degree at most (n − 1)
4
d
Notation:
Property 1
4
d
V is a set of nodes, called vertices
E is a collection of pairs of vertices, called edges
Vertices and edges are nodes and store elements
Property 2
2
d
n
n
c
2
34
Graph
6
0
b
c
1
review
11
a
b
2
v5
33
a
X = abracadabra
Frequencies
a
5
Move new tree to correct place
15 24 32
15 24
b
r
35
review
36
6
Graphs - Data Structures
Graphs - Data Structures
1
5
2
Vertices
n
n
v2
v3
v4
n
v5
0
1
2
3
4
Edges
n
Edges
Map to consecutive integers
Store vertices in an array
v1
Adjacency Matrix
2
3
1
0
1
0
0
4
1
0
0
0
1
0
1
1
0
1
0
0
1
1
0
0
review
3
Adjacency Lists
w For each vertex
0
n
List of vertices “attached” to it
5
w O(|E|) space
∴
w Booleans 1
0
1
0
0
1 - edge exists
0 - no edge
w O(|V|2) space (where |V| refers to the number of vertices)
Better for sparse graphs
37
Undirected representation
review
Spanning Trees
A spanning tree of a
connected graph is a
spanning subgraph
that is a tree
A spanning tree is not
unique unless the
graph is a tree
4
38
Depth-First Search
Depth-first search (DFS)
is a general technique
for traversing a graph
A DFS traversal of a
graph G
Graph
n
n
A spanning tree of
G is a subgraph
that contains all
the vertices of G
n
n
Visits all the vertices and
edges of G
Determines whether G is
connected
Computes the connected
components of G
Computes a spanning
forest of G
DFS on a graph with n
vertices and m edges
takes O(n + m ) time
DFS can be further
extended to solve other
graph problems
n
n
Find and report a path
between two given
vertices
Find a cycle in the graph
Spanning tree
review
39
Initialise d and π
n For each vertex, j, in V
For a graph,
G = ( V, E )
Dijkstra’s algorithm keeps two sets of vertices:
V-S
n Source distance, d s
=0
Set S to empty
While V-S is not empty
n Sort V-S based on d
n Add u , the closest vertex in V-S, to S
Best estimates of shortest path to each vertex
Predecessors for each vertex
n
review
Initial estimates are all ∞
No connections
w dj = ∞
• πj = nil
Vertices whose shortest paths have already been
determined
Remainder
Also
d
π
40
Dijkstra’s Algorithm Operation
Shortest path: Dijkstra’s Algorithm
S
review
41
Add s first!
Relax all the vertices still in V-S connected to u
review
42
7
Dijkstra’s Algorithm - Time
Hash Functions
Complexity
A hash function h maps keys of a given type
to integers in a fixed interval [0, N − 1]
Example:
h(x) = x mod N
is a hash function for integer keys
The integer h(x) is called the hash
value of key x
The goal of a hash function is to
uniformly disperse keys in the range
[0, N − 1]
Dijkstra’s Algorithm
n Similar to MST algorithms
n Key step is sort on the edges
n Complexity is
w O( (|E|+|V|)log|V| ) or
w O( n 2 log n )
for a dense graph with n = |V| and |E| ≈ |V|2
review
43
Hash Tables
A hash table for a given key type consists of
n Hash function h
n Array (called table) of size N
When implementing a dictionary with a hash table, the
goal is to store item (k, o) at index i = h (k)
A collision occurs when two keys in the
dictionary have the same hash value, i.e.,
h(k) == h(k’), whereas k != k’
Collision handing schemes:
n Chaining: colliding items are stored in a
sequence
n Open addressing: the colliding item is placed
in a different cell of the table
review
review
44
Linear Probing
Linear probing handles
collisions by placing the
colliding item in the next
(circularly) available
table cell
Each table cell inspected
is referred to as a
“probe”
Colliding items lump
together, causing future
collisions to cause a
longer sequence of
probes
45
Double Hashing
Double hashing uses a
secondary hash function
d(k) and handles
collisions by placing an
item in the first available
cell of the series
(i + jd(k)) mod N
for j = 0, 1, … , N − 1
The secondary hash
function d(k) cannot have
zero values
The table size N must be
a prime to allow probing
of all the cells
review
Example:
n
n
h(x) = x mod 13
Insert keys 18, 41,
22, 44, 59, 32, 31,
73, in this order
0 1 2 3 4 5 6 7 8 9 10 11 12
41
18 44 59 32 22 31 73
0 1 2 3 4 5 6 7 8 9 10 11 12
review
46
Example of Double Hashing
Common choice of
compression map for the
secondary hash function:
d 2(k) = q − k mod q
where
n
n
q<N
q is a prime
n
n
The possible values for
d 2(k) are
1, 2, … , q
47
k
Consider a hash
table storing integer
keys that handles
collision with double
hashing
n
N = 13
h(k) = k mod 13
d(k) = 7 − k mod 7
Insert keys 18, 41,
22, 44, 59, 32, 31,
73, in this order
18
41
22
44
59
32
31
73
h (k ) d (k ) Probes
5
2
9
5
7
6
5
8
3
1
6
5
4
3
4
4
5
2
9
5
7
6
5
8
10
9
0
0 1 2 3 4 5 6 7 8 9 10 11 12
31 41
18 32 59 73 22 44
0 1 2 3 4 5 6 7 8 9 10 11 12
review
48
8
Collision resolution using Overflow area
Ë
Collision resolution using Linked Lists:
Overflow area
Dynamically allocate space.
Easy to insert/delete an item
Need a link for each node in the hash
table.
• Linked list constructed
in special area of table
called overflow area
n
n
n
h(k) == h(j)
k stored first
Adding j
w Calculate h(j)
w Find k
w Get first slot in overflow area
w Put j in it
w k’s pointer points to this slot
n
Searching - same as linked list
review
49
Summary of Sorting Algorithms
Algorithm
Time
Notes
O(n2)
insertion -sort
O(n2)
in-place
for small data sets (< 1K)
slow
in-place
for small data sets (< 1K)
heap-sort
O(n log n)
merge-sort
O(n log n)
n
fast
n
sequential data access
for huge data sets (> 1M)
n
51
depth time
1
n−1
…
…
…
E
Recur: sort L and G
Conquer: join L, E and G
Consider a recursive call of quicksort on an array of size s
n
n
53
Good call : the sizes of L and G
are each less than 3s/4
Bad call: one of L and G has size
greater than 3s/4
A call is good with probability 1/2
Probabilistic Fact: The expected
number of coin tosses required in
order to get k heads is 2k
Hence, for a node of depth i, we
expect that
n
1
review
L
G
x
review
n
n−1
x
52
Expected Running Time
The worst case for quick-sort occurs when the pivot is the unique
minimum or maximum element
One of L and G has size n − 1 and the other has size 0
The running time is proportional to the sum
n + (n − 1) + … + 2 + 1
Thus, the worst-case running time of quick-sort is O(n2 )
n
x
Divide: pick a random
element x (called pivot) and
partition S into
w L elements less than x
w E elements equal x
w G elements greater than x
Worst-case Running Time
0
Quick-Sort
fast
in-place
for large data sets (1K — 1M)
review
50
Quick -sort is a randomized
sorting algorithm based
on the divide-and-conquer
paradigm:
slow
selection-sort
review
i/2 parent nodes are associated
with good calls
the size of the input sequence for
the current call is at most ( 3/4) i/2 n
review
Thus, we have
n
n
For a node of depth
2log 4/3 n, the expected
size of the input
sequence is one
The expected height
of the quick-sort tree
is O(log n)
The overall amount or
work done at the nodes
of the same depth of
the quick-sort tree is
O(n)
Thus, the expected
running time of quicksort is O(n log n)
54
9
Distribution Counting Sort
Algorithm:
Function Distribution_counting_sort(S, n){
Input: a student array S of n records
Output: a sorted array (wrt grade) NS
Suppose we have an array of student
records:
S
Tom
99
Mary
73
Jack
56
Tim
73
……
Bob
82
Question: sort the array with respect to
s[i].grade
review
55
int count[101]; /*init to 0’s */
/* counting */
for (i = 0; i < n; i++) count[S[i].grade]++;
/* accumulating */
count[0]--;
for (i = 1; i < 101; i++) count[i] = count[i -1] + count[i];
/* distribution */
for (i = 0; i < n; i++) NS[count[S[i].grade]--] = S[i];
review
56
Pattern Matching
Ø The brute-force pattern matching algorithm compares the pattern P
with the text T for each possible shift of P relative to T, until either a
match is found, or all placements of the pattern have been tried.
Brute-force pattern matching runs in time O(nm)
w The Boyer-Moore’s pattern matching algorithm is based on two
heuristics
Looking-glass heuristic: Compare P with a subsequence of T moving
backwards
Character-jump heuristic: When a mismatch occurs at T[i] = c
n
n
If P contains c, shift P to align the last occurrence of c in P with T[i]
Else, shift P to align P[0] with T[i + 1]
n Boyer-Moore’s algorithm runs in time O(nm + s)
Knuth-Morris-Pratt’s algorithm preprocesses the pattern to find
matches of prefixes of the pattern with the pattern itself. KMP’s
algorithm runs in optimal time O( m + n)
10