Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linked list: CIS 2520 Data Structures: Review A linked list represents a sequence: Every node but one has a predecessor, and every node but one has a successor. tail head pt …….. null /* recursive definition */ typedef struct node { int data; // whatever useful in the node struct node* next; // link to the next node } node; review Recursive versions of print_list Ordered Linked List: /* print list recursively (from head to tail)*/ void print_list_1(node* p){ if (p){ printf(“data: %d \n”, p->data); printf_list_1(p->next); } Insert nodes in their sorted position. typedef struct node{ int data; struct node* next; } node; } head /* print list recursively (from tail back to head)*/ void print_list_2(node* p){ if (p){ printf_list_2(p->next); printf(“data: %d \n”, p->data); } } d1 d3 3 dn null …….. review Question: how to insert dk into the list? Version 1: 4 while (curr && data > curr->data){ prev = curr; head curr = curr->next; *hp dk null } d1 Case 2: dk should be inserted in front of the list head dk <= d1 d2 where d1 <= d2 <= d3 <= … <= dn review Case 1: head == NULL 2 d2 d3 …….. dn null Version 2: dk d1 d2 …….. dn null p dummy *hp Case 3: dk should be inserted after a node head while (p->next && data > p->next ->data) d1 d1 d2 <= dk <= d3 d2 d3 …….. d2 d3 dn null p = p->next; …….. dn null dk review 5 review 6 1 Doubly Linked List: Binary search function: A list which can be traversed either forward or backward: binary search is used to find a target value in a sorted table. we start by comparing the target value with the table’s middle element. since the table is sorted, so if the target is larger, we can ignore all values smaller than the middle element, and vice versa. We stop when we’ve found the target or no values left to search. typedef struct node{ int data; struct node* next; struct node* prev; } node; head … … null null review 7 review 8 int* bsearch(int target, int *table, int n){ int *min = table; int *max = table + (n – 1); int *mid; int k = n/2; 2 while (min < max) { mid = min + k; if (target == *mid) return mid; else if (target > *mid) min = mid + 1; else max = mid - 1; k /= 2; } return NULL; 5 8 11 15 17 22 25 34 36 43 57 59 66 72 min max mid target = 36 2 5 8 11 15 17 22 25 34 36 43 57 59 66 72 2 5 8 11 15 17 22 25 34 36 43 57 59 66 72 min } mid max min mid max review 9 10 Three Characteristics of Recursion Useful recursion To be useful the recursion must terminate, so there must be at least one non-recursive case such as: 0! as well as recursive cases. such as: n * (n – 1)! review review Calls itself recursively Has some terminating condition Moves “closer” to the terminating condition. 11 review 12 2 Big-Oh Rules Algorithm Analysis If is f(n) a polynomial of degree d, then f(n) is O(nd ), i.e., 1. Drop lower-order terms 2. Drop constant factors n n Say “ 2n is O(n)” instead of “2n is n O(n2)” Use the simplest expression of the class n n Say “ 3n + 5 is O(n)” instead of “3n + 5 is O(3n)” review n O review • where K is a constant n O (2 N) n O (N!) O (N N) then f + h is O(max(g, r)) If f is a polynomial of degree d, then f is O( n d) eg 10n 4 + 5n 6 + n 2 is O ( n6 ) review 15 Properties of the O notation 16 An abstract data type (ADT) is an abstraction of a data structure ADT refers to a way of packaging some intermediate-level data structures and their operations into a useful collection whose properties have been carefully studied. An ADT specifies: n n logbn is O(logdn) ∀ b, d > 1 review review Abstract Data Types (ADTs) f is O(g) is transitive n If f is O(g) and g is O(h) then f is O(h) Product of upper bounds is upper bound for the product n If f is O(g) and h is O(r) then f*h is O(g*r) All logarithms grow at the same rate n f is O(g) and h is O(r) eg an 4 + log n is O(max(n 4 , log n)) à O( n 4) Ø Polynomial’s growth rate is determined by leading term Unreasonable algorithms have exponential factors n ∀ k > 0, kf is O( f) Ø Fastest growing term dominates a sum n n 14 Ø Constant factors may be ignored O (Log N) O (N) (N K ) Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations Properties of the O notation Reasonable algorithms have polynomial factors n We determine that algorithm arrayMax executes at most 6n − 2 primitive operations We say that algorithm arrayMax “runs in O(n) time” 13 Reasonable vs. Unreasonable n We find the worst -case number of primitive operations executed as a function of the input size We express this function with big-Oh notation Example: Use the smallest possible class of functions n The analysis of an algorithm determines the running time in big-Oh notation To perform the analysis n 17 Data stored Operations on the data (clean, simple interface) Error conditions associated with operations review 18 3 The Stack and Queue ADT Binary Tree The Stack ADT stores arbitrary elements Insertions and deletions follow the LIFO scheme Main stack operations: n push(element): inserts an element n element pop(): removes and returns the last inserted element A binary tree is a tree with the following properties: n n n n enqueue(element): inserts an element at the end of the queue element dequeue(): removes and returns the element at the front of the queue review A B C D E F H G I review 20 A heap is a binary tree storing keys at its internal nodes and satisfying the following properties: Heap-Order: for every internal node v other than the root, key(v) ≥ key(parent(v)) Complete Binary Tree: let h be the height of the heap n n The last node of a heap is the rightmost internal node of depth h − 1 2 5 9 6 7 w for i = 0, … , h − 1, there are 2 i nodes of depth i w at depth h − 1, the internal nodes are to the left of the external nodes review 21 Upheap last node review 22 Downheap After the insertion of a new key k, the heap-order property may be violated Algorithm upheap restores the heap-order property by swapping k along an upward path from the insertion node Upheap terminates when the key k reaches the root or a node whose parent has a key smaller than or equal to k Since a heap has height O(log n), upheap runs in O(log n) time 2 After replacing the root key with the key k of the last node, the heap-order property may be violated Algorithm downheap restores the heap-order property by swapping key k along a downward path from the root Downheap terminates when key k reaches a leaf or a node whose children have keys greater than or equal to k Since a heap has height O(log n), downheap runs in O(log n) time 7 1 5 1 7 n What is a heap function inOrder(v) if (isInternal (v)){ inOrder (leftChild (v));} visit(v); if (isInternal (v)){ inOrder (rightChild (v));} function postOrder (v) if (isInternal (v)){ inOrder (leftChild (v));} if (isInternal (v)){ inOrder (rightChild (v));} visit(v); 9 n a tree consisting of a single node, or a tree whose root has an ordered pair of children, each of which is a binary tree 19 Binary Tree Traversal function preOrder(v) visit(v); if (isInternal (v)){ inOrder (leftChild (v));} if (isInternal (v)){ inOrder (rightChild (v));} n arithmetic expressions decision processes searching n We call the children of an internal node left child and right child Alternative recursive definition: a binary tree is either The Queue ADT stores arbitrary elements Insertions and deletions follow the FIFO scheme Main queue operations: n Applications: Each internal node has two children The children of a node are an ordered pair z 5 6 9 review 5 2 7 z 5 6 7 w 23 6 w 9 6 9 review 24 4 AVL Tree Binary Search Tree A binary search tree is a binary tree storing keys (or key -element pairs) at its internal nodes and satisfying the following property: n An inorder traversal of a binary search trees visits the keys in increasing order Let v be a tree node, and L, R be subtrees such that L is the left subtree of v and R is the right subtree of v. We have keys(L) ≤ key(v) ≤ keys(R) 6 2 9 1 4 8 AVL trees are balanced. An AVL Tree is a binary search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1. 4 44 2 17 25 review 1 48 4 62 x (c) 50 T0 54 z y 62 5 x T0 48 78 54 88 T2 T1 T3 Make a new tree which is balanced and put the 7 parts from the old tree into the new tree so that the numbering is still correct when we do an in-ordertraversal of the new tree. This works regardless of how the tree is originally unbalanced. review 28 Now cut x,y, and z in that order (child,parent,grandparent) and place them in their inorder rank in the array. a 44 T1 62 b T2 c 78 T3 1 2 3 4 5 6 7 •Now we can re-link these subtrees to the main tree. 6 •Link in rank 4 (b) where the subtree’s root formerly 78 48 26 50 T0 2 z (a) y (b) 1 Cut/Link Restructure Algorithm Number the 7 parts by doing an in-order-traversal. (note that x,y, and z are now renamed based upon their order within the traversal) 3 62 An example of an AVL tree where the heights are shown next to the nodes: 17 27 Cut/Link Restructure Algorithm 17 88 review 44 If an insertion(w) causes T to become unbalanced, we travel up the tree from the newly created node until we find the first node x such that its grandparent z is unbalanced node. If a remove(w) can cause T to become unbalanced, let z be the first unbalanced node encountered while traveling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height. To rebalance the subtree rooted at z, we must perform a restructuring 44 1 50 Cut/Link Restructure Algorithm rebalancing 1 3 2 32 External nodes do not store items (NULL’s ) review 78 1 88 7 4 T2 b 62 T1 review T3 29 review 30 5 (2,4) Tree Insertion We insert a new item (k, o) at the parent v of the leaf reached by searching for k A (2,4) tree (also called 2-4 tree or 2-3-4 tree) is a multi-way search with the following properties n Node-Size Property: every internal node has at most four children Depth Property: all the external nodes have the same depth n n n Example: inserting key 30 causes an overflow Depending on the number of children, an internal node of a (2,4) tree is called a 2-node, 3-node or 4-node 10 2 8 15 We preserve the depth property but We may cause an overflow (i.e., node v may become a 5-node) 10 24 12 18 27 2 8 12 2 8 12 15 24 v 18 27 32 35 32 10 review 31 15 24 v 18 27 30 32 review 35 32 Huffman Encoding Trie Overflow and Split Greedy Approach We handle an overflow at a 5-node v with a split operation: let v1 … v5 be the children of v and k1 … k4 be the keys of v node v is replaced nodes v' and v" n n n Sort characters by frequency n Form two lowest weight nodes into a sub-tree w v' is a 3-node with keys k1 k2 and children v 1 v 2 v 3 w v" is a 2-node with key k4 and children v 4 v 5 w Sub-tree weight = sum of weights of nodes key k 3 is inserted into the parent u of v (a new root may be created) n The overflow may propagate to the parent node u n u u v 12 18 v' 27 30 32 35 12 18 v" 27 30 35 v1 v2 v3 v4 v1 v2 v3 v4 v5 review Example c d r 5 2 1 1 2 d 1 a 5 b 2 c A graph is a pair (V, E), where 2 4 100 d b r 101 110 111 r 2 n 6 2 r 2 a 5 c a 5 review c b Σ v deg(v) = 2m r Proof: each endpoint is counted twice n number of vertices m number of edges deg(v) degree of vertex v In an undirected graph with no self-loops and no multiple edges m ≤ n (n − 1)/2 Proof: each vertex has degree at most (n − 1) 4 d Notation: Property 1 4 d V is a set of nodes, called vertices E is a collection of pairs of vertices, called edges Vertices and edges are nodes and store elements Property 2 2 d n n c 2 34 Graph 6 0 b c 1 review 11 a b 2 v5 33 a X = abracadabra Frequencies a 5 Move new tree to correct place 15 24 32 15 24 b r 35 review 36 6 Graphs - Data Structures Graphs - Data Structures 1 5 2 Vertices n n v2 v3 v4 n v5 0 1 2 3 4 Edges n Edges Map to consecutive integers Store vertices in an array v1 Adjacency Matrix 2 3 1 0 1 0 0 4 1 0 0 0 1 0 1 1 0 1 0 0 1 1 0 0 review 3 Adjacency Lists w For each vertex 0 n List of vertices “attached” to it 5 w O(|E|) space ∴ w Booleans 1 0 1 0 0 1 - edge exists 0 - no edge w O(|V|2) space (where |V| refers to the number of vertices) Better for sparse graphs 37 Undirected representation review Spanning Trees A spanning tree of a connected graph is a spanning subgraph that is a tree A spanning tree is not unique unless the graph is a tree 4 38 Depth-First Search Depth-first search (DFS) is a general technique for traversing a graph A DFS traversal of a graph G Graph n n A spanning tree of G is a subgraph that contains all the vertices of G n n Visits all the vertices and edges of G Determines whether G is connected Computes the connected components of G Computes a spanning forest of G DFS on a graph with n vertices and m edges takes O(n + m ) time DFS can be further extended to solve other graph problems n n Find and report a path between two given vertices Find a cycle in the graph Spanning tree review 39 Initialise d and π n For each vertex, j, in V For a graph, G = ( V, E ) Dijkstra’s algorithm keeps two sets of vertices: V-S n Source distance, d s =0 Set S to empty While V-S is not empty n Sort V-S based on d n Add u , the closest vertex in V-S, to S Best estimates of shortest path to each vertex Predecessors for each vertex n review Initial estimates are all ∞ No connections w dj = ∞ • πj = nil Vertices whose shortest paths have already been determined Remainder Also d π 40 Dijkstra’s Algorithm Operation Shortest path: Dijkstra’s Algorithm S review 41 Add s first! Relax all the vertices still in V-S connected to u review 42 7 Dijkstra’s Algorithm - Time Hash Functions Complexity A hash function h maps keys of a given type to integers in a fixed interval [0, N − 1] Example: h(x) = x mod N is a hash function for integer keys The integer h(x) is called the hash value of key x The goal of a hash function is to uniformly disperse keys in the range [0, N − 1] Dijkstra’s Algorithm n Similar to MST algorithms n Key step is sort on the edges n Complexity is w O( (|E|+|V|)log|V| ) or w O( n 2 log n ) for a dense graph with n = |V| and |E| ≈ |V|2 review 43 Hash Tables A hash table for a given key type consists of n Hash function h n Array (called table) of size N When implementing a dictionary with a hash table, the goal is to store item (k, o) at index i = h (k) A collision occurs when two keys in the dictionary have the same hash value, i.e., h(k) == h(k’), whereas k != k’ Collision handing schemes: n Chaining: colliding items are stored in a sequence n Open addressing: the colliding item is placed in a different cell of the table review review 44 Linear Probing Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a “probe” Colliding items lump together, causing future collisions to cause a longer sequence of probes 45 Double Hashing Double hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series (i + jd(k)) mod N for j = 0, 1, … , N − 1 The secondary hash function d(k) cannot have zero values The table size N must be a prime to allow probing of all the cells review Example: n n h(x) = x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 0 1 2 3 4 5 6 7 8 9 10 11 12 41 18 44 59 32 22 31 73 0 1 2 3 4 5 6 7 8 9 10 11 12 review 46 Example of Double Hashing Common choice of compression map for the secondary hash function: d 2(k) = q − k mod q where n n q<N q is a prime n n The possible values for d 2(k) are 1, 2, … , q 47 k Consider a hash table storing integer keys that handles collision with double hashing n N = 13 h(k) = k mod 13 d(k) = 7 − k mod 7 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order 18 41 22 44 59 32 31 73 h (k ) d (k ) Probes 5 2 9 5 7 6 5 8 3 1 6 5 4 3 4 4 5 2 9 5 7 6 5 8 10 9 0 0 1 2 3 4 5 6 7 8 9 10 11 12 31 41 18 32 59 73 22 44 0 1 2 3 4 5 6 7 8 9 10 11 12 review 48 8 Collision resolution using Overflow area Ë Collision resolution using Linked Lists: Overflow area Dynamically allocate space. Easy to insert/delete an item Need a link for each node in the hash table. • Linked list constructed in special area of table called overflow area n n n h(k) == h(j) k stored first Adding j w Calculate h(j) w Find k w Get first slot in overflow area w Put j in it w k’s pointer points to this slot n Searching - same as linked list review 49 Summary of Sorting Algorithms Algorithm Time Notes O(n2) insertion -sort O(n2) in-place for small data sets (< 1K) slow in-place for small data sets (< 1K) heap-sort O(n log n) merge-sort O(n log n) n fast n sequential data access for huge data sets (> 1M) n 51 depth time 1 n−1 … … … E Recur: sort L and G Conquer: join L, E and G Consider a recursive call of quicksort on an array of size s n n 53 Good call : the sizes of L and G are each less than 3s/4 Bad call: one of L and G has size greater than 3s/4 A call is good with probability 1/2 Probabilistic Fact: The expected number of coin tosses required in order to get k heads is 2k Hence, for a node of depth i, we expect that n 1 review L G x review n n−1 x 52 Expected Running Time The worst case for quick-sort occurs when the pivot is the unique minimum or maximum element One of L and G has size n − 1 and the other has size 0 The running time is proportional to the sum n + (n − 1) + … + 2 + 1 Thus, the worst-case running time of quick-sort is O(n2 ) n x Divide: pick a random element x (called pivot) and partition S into w L elements less than x w E elements equal x w G elements greater than x Worst-case Running Time 0 Quick-Sort fast in-place for large data sets (1K — 1M) review 50 Quick -sort is a randomized sorting algorithm based on the divide-and-conquer paradigm: slow selection-sort review i/2 parent nodes are associated with good calls the size of the input sequence for the current call is at most ( 3/4) i/2 n review Thus, we have n n For a node of depth 2log 4/3 n, the expected size of the input sequence is one The expected height of the quick-sort tree is O(log n) The overall amount or work done at the nodes of the same depth of the quick-sort tree is O(n) Thus, the expected running time of quicksort is O(n log n) 54 9 Distribution Counting Sort Algorithm: Function Distribution_counting_sort(S, n){ Input: a student array S of n records Output: a sorted array (wrt grade) NS Suppose we have an array of student records: S Tom 99 Mary 73 Jack 56 Tim 73 …… Bob 82 Question: sort the array with respect to s[i].grade review 55 int count[101]; /*init to 0’s */ /* counting */ for (i = 0; i < n; i++) count[S[i].grade]++; /* accumulating */ count[0]--; for (i = 1; i < 101; i++) count[i] = count[i -1] + count[i]; /* distribution */ for (i = 0; i < n; i++) NS[count[S[i].grade]--] = S[i]; review 56 Pattern Matching Ø The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift of P relative to T, until either a match is found, or all placements of the pattern have been tried. Brute-force pattern matching runs in time O(nm) w The Boyer-Moore’s pattern matching algorithm is based on two heuristics Looking-glass heuristic: Compare P with a subsequence of T moving backwards Character-jump heuristic: When a mismatch occurs at T[i] = c n n If P contains c, shift P to align the last occurrence of c in P with T[i] Else, shift P to align P[0] with T[i + 1] n Boyer-Moore’s algorithm runs in time O(nm + s) Knuth-Morris-Pratt’s algorithm preprocesses the pattern to find matches of prefixes of the pattern with the pattern itself. KMP’s algorithm runs in optimal time O( m + n) 10