Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hash Table Functions Redouan Lahmyed*, Parid Weasamae*, Said Charfi*, Sami Saad Ahmed * * Department of Computer Science-Mysore University II Semester, MSc. Abstract—Hash table is one of the fastest data structures which is used when needing for quick processing for searching and input operations. This data structure uses hash function which convert input values into other values, index, in the table. When a collision is happened, it can be solved using ‘open addressing’ method or ‘separate chaining’. In case of using the first method, then double hashing method is efficient unless the table doesn’t change and no input operations after creation where ‘linear hashing’ will be better. The second method ‘separate chaining’ is in general better than the first method. It is not related to ‘load factor’ and any operations can be easily done after creating table. Keywords— algorithm, analysis algorithm, hash table, hashing, separate chaining. I. INTRODUCTION Internet has grown to millions of users generating terabytes of content every day. According to internet data tracking services, the amount of content on the internet doubles every six months. With this kind of growth, it is impossible to find anything in the internet, unless we develop new data structures and algorithms for storing and accessing data. So what is wrong with traditional data structures like Arrays and Linked Lists? Suppose we have a very large data set stored in an array. The amount of time required to look up an element in the array is either O(log n) or O(n) based on whether the array is sorted or not. If the array is sorted then a technique such as binary search can be used to search the array. Otherwise, the array must be searched linearly. Either case may not be desirable if we need to process a very large data set. Therefore we discuss a new technique called hashing that allows us to update and retrieve any entry in constant time O(1). The constant time or O(1) performance means, the amount of time to perform the operation does not depend on data size n. In a mathematical sense, a map is a relation between two sets. We can define Map M as a set of pairs, where each pair is of the form (key, value), where for given a key, we can find a value using some kind of a “function” that maps keys to values. The key for a given object can be calculated using a function called a hash function. In its simplest form, we can think of an array as a Map where key is the index and value is the value at that index. For example, given an array A, if i is the key, then we can find the value by simply looking up A[i]. The idea of a hash table is more generalized and can be described as follows. The concept of a hash table is a generalized idea of an array where key does not have to be an integer. We can have a name as a key, or for that matter any object as the key. The trick is to find a hash function to compute an index so that an object can be stored at a specific location in a table such that it can easily be found. Hash table is considered ultimately the fastest and most important data structure, and many applications use it such as Spell Checker. This structure provides an easy, very fast way to access data regardless of the size of that data. In spite of the advantages of this structure, there are some challenges: 1. Hash table uses an array which is impossible to increase its size. 2. The performance of this structure gradually decreases with increasing the number of full cells in the array. 3. In hash table, visiting the elements in the array is a very difficult process because elements are stored in random locations. II. HASHING FUNCTION A hash function is any algorithm or subroutine that maps large data sets of variable length to smaller data sets of a fixed length. For example, a person's name, having a variable length, could be hashed to a single integer. The values returned by a hash function are called hash values. First, we will discuss how to convert the keys into indexes. in the simple case, if we have 1000 record, we can use an array of size 1000 and each record will be stored in a cell. That means every location in the array can accommodate many values, we call this problem ‘collision’. So, it is important to have a key to access any record in the array. The index number of the array is a good idea to access data, but the key is not always numerical value. If the key is string, how we can find the appropriate index for that data. It is difficult to find a “perfect” hash function, that is a function that has no collisions. But we can do “better” by using hash functions as follows. Suppose we need to store a dictionary in a hash table. A dictionary is a set of Strings and we can define a hash function. The mission is to convert the string into index. There are many methods to do that. The easiest way is to convert every char into number using ASCII code but many values will be stored in the same location. So, we use hashing in order to solve this problem by using Modulo operator to guarantee the index is in the range of the array. There are many ways to solve this problem. ‘Open addressing’ method and ‘separate chaining’ are the two methods that can be used for solving collision. Open addressing can be executed using double hashing method or linear hashing. Double hashing is efficient unless the table doesn’t change and no input operations after creation where ‘linear hashing’ will be better. The second method ‘separate chaining’ is in general better than the first method. It is not related to ‘load factor and any operations can be easily done after creating table. In our paper we will show using a method called ‘Separate Chaining’. In this method, we have an array of which every cell indicates to linked list. When hashing any string and getting its index, the string will be put in the list indicated to by that index. If a collision happens, the string also will be add in that list. Figure 1. calculating the distance between two points. figur 4. Graph representing complexity of the second algorithm In this way, all elements will be stored in the hash table. III. HASHING FUNCTION APPLICATION The following is an application using separate chaining in C++ programming language: //separate chaining #include <iostream> #include <vector> using namespace std; class Data { private: int value; Data* next; public : Data(int d):value(d),next(NULL) { } // void setData (int d) { value = d ; } int getData () const { return value; } // void setNext (Data* d) { next = d; } Data* getNext () const { return next; } }; class SortedLinkedList { private: Data* first; public : SortedLinkedList(); // void insert (int key); void remove (int key); Data* find (int key); // friend ostream& operator << (ostream& ostr , const SortedLinkedList& list); }; SortedLinkedList :: SortedLinkedList (){ first = NULL; } void SortedLinkedList :: insert (int key){ Data* data = new Data(key); // Data* prev = NULL; Data* current = first; while ( current != NULL && key > current>getData()) { prev = current; current = current->getNext(); } if ( prev == NULL ) first = data; else prev->setNext(data); data->setNext(current); } void SortedLinkedList :: remove (int key) { Data* data = new Data(key); // Data* prev = NULL; Data* current = first; while ( current != NULL && key != current>getData()) { prev = current; current = current->getNext(); } if ( prev == NULL) first = first->getNext(); else prev->setNext(current->getNext()); } Data* SortedLinkedList :: find (int key) { Data* current = first; while ( current != NULL && current->getData() <= key) { if ( current->getData() == key ) return current; current = current->getNext(); } return NULL; } ostream& operator << (ostream& ostr , const SortedLinkedList& list) { ostr << "Content: "; Data* current = list.first; while ( current != NULL ) { ostr << current->getData() << " "; current = current->getNext(); } ostr << endl; return ostr; } hashTable[hash]->insert(key); } void HashTable :: remove (int key){ int hash = hashFunction(key); hashTable[hash]->remove(key); } Data* HashTable :: find (int key){ int hash = hashFunction(key); return hashTable[hash]->find(key); } int HashTable :: hashFunction(int key) { return key % size; } ostream& operator << (ostream& ostr , const HashTable& tbl ) { class HashTable { private: vector<SortedLinkedList*> hashTable; int size; public : HashTable (int size); // void insert (int key); void remove (int key); Data* find (int key); // friend ostream& operator << (ostream& ostr , const HashTable& tbl); private : int hashFunction (int key); }; HashTable :: HashTable (int size) { hashTable.resize(size); this->size = size; // for (int i=0; i<size; i++) hashTable[i] = new SortedLinkedList; } void HashTable :: insert (int key) { int hash = hashFunction(key); for (int i=0; i<tbl.size; i++){ if ( tbl.hashTable[i] != NULL ) cout << i << " : " << *tbl.hashTable[i] << " "; else cout << "** "; } cout << endl; return (ostr); } int menu () { cout << "[1] insert value " << endl << "[2] find value " << endl << "[3] remove value " << endl << "[4] display table" << endl << "[5] exit " << endl << " >> "; int choice; cin >> choice; return choice; } int main (int argc , char* argv[]) { cout << "Enter Size of hash : "; int size , key; cin >> size; // HashTable hashTable(size); // time_t aTime; srand( static_cast<unsigned>(time(&aTime))); for (int i=0; i<size/2; i++) hashTable.insert( rand() % (100*size) ); // bool state = true; while ( state ) { switch (menu()) { case 1 : cout << "Enter key to insert: "; cin >> key; hashTable.insert(key); break; case 2: cout << "Enter key to find: "; cin >> key; if ( hashTable.find(key) != NULL ) cout << "Found key " << key << endl; else cout << "cannot found " << key << endl; break; case 3: cout << "Enter key to remove : "; cin >> key; hashTable.remove(key); break; case 4: cout << hashTable << endl; break; case 5 : state = false; break; } } return (0); } IV. CONCLUSIONS In this paper we introduced the idea if hash table and an application using it. Hash table is one of the fastest data structures which is used when needing for quick processing for searching and input. This data structure uses hash function which convert input values into other values, index in the table. When a collision is happened, it can be solved using ‘open addressing’ method or ‘separate chaining’. In case of using the first method, then double hashing method is efficient unless the table doesn’t change and no input operations after creation where ‘linear hashing’ will be better. The second method ‘separate chaining’ is in general better than the first method. It is not related to ‘load factor and any operations can be easily done after creating table. REFERENCES [1] [2] [3] R. G. Dromey, How to solve it by computer, Ed. Dorling Kindersley, India. Arab alg. website http://www.aralg.com/vb/showthread.php?t=110831 (2013) Wikipedia, the free encyclopedia website, [online] http://en.wikipedia.org/wiki/ Hash_function