Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Searching: Hash Tables Chapter 12 6/9/15 Adapted from instructor resource slides Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Info: • Still grading exams. Will review answers Thursday • Review how to handle issues with project grading • Review Thursday’s material • Hashing (new) • break • Start sorting • Review project 2 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 2 Evolution of Reusability, Genericity • Major theme in development of programming languages – Reuse code – Avoid repeatedly reinventing the wheel • Trend contributing to this – Use of generic code – Can be used with different types of data Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 3 Function Genericity Overloading and Templates • Initially code was reusable by encapsulating it within functions • Example lines of code to swap values stored in two variables – Instead of rewriting those 3 lines – Place in a function void swap (int & first, int & second) { int temp = first; first = second; second = temp; } – Then call swap(x,y); Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 4 Template Mechanism • Declare a type parameter – also called a type placeholder • Use it in the function instead of a specific type. – This requires a different kind of parameter list: void Swap(______ & first, ______ & second) { ________ temp = first; first = second; second = temp; } Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 5 Instantiating Class Templates • Instantiate it by using declaration of form ClassName<Type> object; • • • Passes Type as an argument to the class template definition. Examples: Stack<int> intSt; Stack<string> stringSt; Compiler will generate two distinct definitions of Stack – two instances – one for ints and one for strings. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 6 STL (Standard Template Library) • A library of class and function templates Components: 1. Containers: • Generic "off-the-shelf" class templates for storing collections of data 2. Algorithms: • Generic "off-the-shelf" function templates for operating on containers 3. Iterators: • Generalized "smart" pointers that allow algorithms to operate on almost any container Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 7 The vector Container • A type-independent pattern for an array class – capacity can expand – self contained • Declaration template <typename T> class vector { . . . } ; Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 8 vector Operations • Information about a vector's contents – – – – v.size() v.empty() v.capacity() v.reserve() • Adding, removing, accessing elements – – – – v.push_back() v.pop_back() v.front() v.back() Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 9 Increasing Capacity of a Vector • When vector v becomes full – capacity increased automatically when item added • Algorithm to increase capacity of vector<T> – Allocate new array to store vector's elements – use T copy constructor to copy existing elements to new array – Store item being added in new array – Destroy old array in vector<T> – Make new array the vector<T>'s storage array Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 10 Iterators • Each STL container declares an iterator type – can be used to define iterator objects – Iterators are a generalization of pointers that allow a C++ program to work with different data structures (containers) in a uniform manner • To declare an iterator object – the identifier iterator must be preceded by • name of container • scope operator :: • Example: vector<int>::iterator vecIter = v.begin() • Would define vecIter as an iterator positioned at the first element of v Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 11 Iterators Contrast use of subscript vs. use of iterator ostream & operator<<(ostream & out, const vector<double> & v) { for (int i = 0; i < v.size(); i++) out << v[i] << " "; return out; } for (vector<double>::iterator it = v.begin(); it != v.end(); it++) out << *it << " "; Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 12 Iterator Functions • Note Table 9-5 • Note the capability of the last two groupings – Possible to insert, erase elements of a vector anywhere in the vector – Must use iterators to do this – Note also these operations are as inefficient as for arrays due to the shifting required Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 13 Contrast Vectors and Arrays Vectors Arrays • Capacity can increase • Fixed size, cannot be changed during execution • A self contained object • Cannot "operate" on itself • Is a class template (No •Bound to specific type specific type) • Has function members •Must "re-invent the wheel" for most actions to do tasks Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 14 STL's deque Class Template • Has the same operations as vector<T> except … – there is no capacity() and no reserve() • Has two new operations: – d.push_front(value); Push copy of value at front of d – d.pop_front(value); Remove value at the front of d Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 15 vector vs. deque vector deque • Capacity of a vector • With deque this must be increased • It must copy the objects from the old vector to the new vector • It must destroy each object in the old vector • A lot of overhead! copying, creating, and destroying is avoided. • Once an object is constructed, it can stay in the same memory locations as long as it exists – If insertions and deletions take place at the ends of the deque. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 16 vector vs. deque • Unlike vectors, a deque isn't stored in a single varying-sized block of memory, but rather in a collection of fixed-size blocks (typically, 4K bytes). • One of its data members is essentially an array map whose elements point to the locations of these blocks. Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 17 Linear Search Vector based search function template <typename t> void LinearSearch (const vector<t> &v, const t &item, boolean &found, int &loc) { found = false; loc = 0; while(loc < n && !found) { if (found || loc == v.size()) return; if (item == x[loc]) found = true; else loc++; } } Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 18 Binary Search Binary search function for vector template <typename t> void LinearSearch (const vector<t> &v, const t &item, boolean &found, int &loc) { found = false; int first = 0; int last = v.size() - 1; while(first <= last && !found) { if (found || first > last) return; } } loc = (first + last) / 2; if (item < v[loc]) last = loc + 1; else if (item > v[loc]) first = loc + 1; else /* item == v[loc] */ found = true; Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 19 Binary Search • Usually outperforms a linear search • Disadvantage: – Requires a sequential storage – Not appropriate for linked lists (Why?) • It is possible to use a linked structure which can be searched in a binary-like manner Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 20 Trees • Tree terminology Root node • Children of the parent (3) Leaf nodes • Siblings to each other Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 21 Binary Trees • Each node has at most two children • Useful in modeling processes where – a comparison or experiment has exactly two possible outcomes – the test is performed repeatedly • Example – multiple coin tosses – encoding/decoding messages in dots and dashes such as Morse code Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 22 Binary Trees • Each node has at most two children • Useful in modeling processes where – a comparison or experiment has exactly two possible outcomes – the test is performed repeatedly • Example – multiple coin tosses – encoding/decoding messages in dots and dashes such as Morse code Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 23 Array Representation of Binary Trees • Works OK for complete trees, not for sparse trees Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 24 Linked Representation of Binary Trees • Uses space more efficiently • Provides additional flexibility • Each node has two links – one to the left child of the node – one to the right child of the node – if no child node exists for a node, the link is set to NULL Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 25 Binary Trees as Recursive Data Structures • A binary tree is either empty … Anchor or • Consists of – a node called the root – root has pointers to two disjoint binary (sub)trees called … • right (sub)tree • left (sub)tree Inductive step Which is either empty … or … Which is either empty … or … Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 26 ADT Binary Search Tree (BST) • Collection of Data Elements – binary tree – each node x, • value in left child of x value in x in right child of x • Basic operations – Construct an empty BST – Determine if BST is empty – Search BST for given item Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 27 ADT Binary Search Tree (BST) • Basic operations (ctd) – Insert a new item in the BST • Maintain the BST property – Delete an item from the BST • Maintain the BST property View BST class – Traverse the BST template, Fig. 12-1 • Visit each node exactly once • The inorder traversal must visit the values in the nodes in ascending order Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 28 BST Traversals • Note that recursive calls must be made – To left subtree – To right subtree • Must use two functions – Public method to send message to BST object – Private auxiliary method that can access BinNodes and pointers within these nodes • Similar solution to graphic output – Public graphic method – Private graphAux method Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 29 BST Searches • Search begins at root – If that is desired item, done • If item is less, move down left subtree • If item searched for is greater, move down right subtree • If item is not found, we will run into an empty subtree • View search() Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 30 Inserting into a BST • Insert function – Uses modified version of search to locate insertion location or already existing item – Pointer parent trails search pointer locptr, keeps track of parent node – Thus new node can be attached to BST in proper place • View insert() function R Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 31 Recursive Deletion Three possible cases to delete a node, x, from a BST 1. The node, x, is a leaf Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 32 Recursive Deletion 2. The node, x has one child Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 33 Recursive Deletion • x has two children Delete node pointed to by xSucc as described for cases 1 and 2 K Replace contents of x with inorder successor Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 34 Problem of Lopsidedness • Trees can be totally lopsided – Suppose each node has a right child only – Degenerates into a linked list Processing time affected by "shape" of tree Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 35 Hash Tables • Recall order of magnitude of searches – Linear search O(n) – Binary search O(log2n) – Balanced binary tree search O(log2n) – Unbalanced binary tree can degrade to O(n) Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 36 Hash Tables • In some situations faster search is needed – Solution is to use a hash function – Value of key field given to hash function – Location in a hash table is calculated Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 37 Hash Functions • Simple function could be to mod the value of the key by the size of the table – H(x) = x % tableSize • Note that we have traded speed for wasted space – Table must be considerably larger than number of items anticipated – Suggested to be 1.5-2x larger Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 38 Hash Functions • Observe the problem with same value returned by h(x) for different values of x – Called collisions • A simple solution is linear probing – Empty slots marked with -1 – Linear search begins at collision location – Continues until empty slot found for insertion Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 39 Hash Functions • When retrieving a value linear probe until found – If empty slot encountered then value is not in table • If deletions permitted – Slot can be marked so it will not be empty and cause an invalid linear probe – Ex. -1 for unused slots, -2 for slots which used to contain data Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 40 Collision Reduction Strategies • Strategies for improved performance – Increase table capacity (less collisions) – Use different collision resolution technique – Devise different hash function Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 41 Collision Reduction Strategies • Hash table capacity – Size of table must be 1.5 to 2 times the size of the number of items to be stored – Otherwise probability of collisions is too high Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 42 Collision Reduction Strategies • Linear probing can result in primary clustering • Consider quadratic probing – Probe sequence from location i is i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, … – Secondary clusters can still form • Double hashing – Use a second hash function to determine probe sequence Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 43 Collision Reduction Strategies • Chaining – Table is a list or vector of head nodes to linked lists – When item hashes to location, it is added to that linked list Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 44 Improving the Hash Function • Ideal hash function – Simple to evaluate – Scatters items uniformly throughout table • Modulo arithmetic not so good for strings – Possible to manipulate numeric (ASCII) value of first and last characters of a name Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 45