Download Hash Table Functions Redouan Lahmyed*, Parid Weasamae*, Said

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Array data structure wikipedia , lookup

Bloom filter wikipedia , lookup

Control table wikipedia , lookup

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Transcript
Hash Table Functions
Redouan Lahmyed*, Parid Weasamae*, Said Charfi*, Sami Saad Ahmed *
*
Department of Computer Science-Mysore University
II Semester, MSc.
Abstract—Hash table is one of the fastest data structures
which is used when needing for quick processing for searching
and input operations. This data structure uses hash function
which convert input values into other values, index, in the table.
When a collision is happened, it can be solved using ‘open
addressing’ method or ‘separate chaining’. In case of using the
first method, then double hashing method is efficient unless the
table doesn’t change and no input operations after creation
where ‘linear hashing’ will be better. The second method
‘separate chaining’ is in general better than the first method. It is
not related to ‘load factor’ and any operations can be easily done
after creating table.
Keywords— algorithm, analysis algorithm, hash table, hashing,
separate chaining.
I. INTRODUCTION
Internet has grown to millions of users generating
terabytes of content every day. According to
internet data tracking services, the amount of
content on the internet doubles every six months.
With this kind of growth, it is impossible to find
anything in the internet, unless we develop new
data structures and algorithms for storing and
accessing data. So what is wrong with traditional
data structures like Arrays and Linked Lists?
Suppose we have a very large data set stored in an
array. The amount of time required to look up an
element in the array is either O(log n) or O(n) based
on whether the array is sorted or not. If the array is
sorted then a technique such as binary search can be
used to search the array. Otherwise, the array must
be searched linearly. Either case may not be
desirable if we need to process a very large data set.
Therefore we discuss a new technique called
hashing that allows us to update and retrieve any
entry in constant time O(1). The constant time or
O(1) performance means, the amount of time to
perform the operation does not depend on data size
n.
In a mathematical sense, a map is a relation
between two sets. We can define Map M as a set of
pairs, where each pair is of the form (key, value),
where for given a key, we can find a value using
some kind of a “function” that maps keys to values.
The key for a given object can be calculated using a
function called a hash function. In its simplest form,
we can think of an array as a Map where key is the
index and value is the value at that index. For
example, given an array A, if i is the key, then we
can find the value by simply looking up A[i]. The
idea of a hash table is more generalized and can be
described as follows.
The concept of a hash table is a generalized idea
of an array where key does not have to be an
integer. We can have a name as a key, or for that
matter any object as the key. The trick is to find a
hash function to compute an index so that an object
can be stored at a specific location in a table such
that it can easily be found.
Hash table is considered ultimately the fastest and
most important data structure, and many
applications use it such as Spell Checker. This
structure provides an easy, very fast way to access
data regardless of the size of that data.
In spite of the advantages of this structure, there are
some challenges:
1. Hash table uses an array which is impossible
to increase its size.
2. The performance of this structure gradually
decreases with increasing the number of full
cells in the array.
3. In hash table, visiting the elements in the
array is a very difficult process because
elements are stored in random locations.
II. HASHING FUNCTION
A hash function is any algorithm or subroutine that
maps large data sets of variable length to smaller
data sets of a fixed length. For example, a person's
name, having a variable length, could be hashed to
a single integer. The values returned by a hash
function are called hash values.
First, we will discuss how to convert the keys into
indexes. in the simple case, if we have 1000 record,
we can use an array of size 1000 and each record
will be stored in a cell.
That means every location in the array can
accommodate many values, we call this problem
‘collision’.
So, it is important to have a key to access any
record in the array. The index number of the array
is a good idea to access data, but the key is not
always numerical value. If the key is string, how we
can find the appropriate index for that data. It is
difficult to find a “perfect” hash function, that is a
function that has no collisions. But we can do
“better” by using hash functions as follows.
Suppose we need to store a dictionary in a hash
table. A dictionary is a set of Strings and we can
define a hash function. The mission is to convert
the string into index. There are many methods to do
that. The easiest way is to convert every char into
number using ASCII code but many values will be
stored in the same location. So, we use hashing in
order to solve this problem by using Modulo
operator to guarantee the index is in the range of the
array.
There are many ways to solve this problem.
‘Open addressing’ method and ‘separate chaining’
are the two methods that can be used for solving
collision. Open addressing can be executed using
double hashing method or linear hashing. Double
hashing is efficient unless the table doesn’t change
and no input operations after creation where ‘linear
hashing’ will be better. The second method
‘separate chaining’ is in general better than the first
method. It is not related to ‘load factor and any
operations can be easily done after creating table.
In our paper we will show using a method called
‘Separate Chaining’. In this method, we have an
array of which every cell indicates to linked list.
When hashing any string and getting its index, the
string will be put in the list indicated to by that
index. If a collision happens, the string also will be
add in that list.
Figure 1. calculating the distance between two points.
figur 4. Graph representing complexity of the second algorithm
In this way, all elements will be stored in the hash
table.
III. HASHING FUNCTION APPLICATION
The following is an application using separate
chaining in C++ programming language:
//separate chaining
#include <iostream>
#include <vector>
using namespace std;
class Data {
private:
int value;
Data* next;
public :
Data(int d):value(d),next(NULL) { }
//
void setData (int d) { value = d ; }
int getData () const { return value; }
//
void setNext (Data* d) { next = d; }
Data* getNext () const { return next; }
};
class SortedLinkedList {
private:
Data* first;
public :
SortedLinkedList();
//
void insert (int key);
void remove (int key);
Data* find (int key);
//
friend ostream& operator << (ostream& ostr ,
const SortedLinkedList& list);
};
SortedLinkedList :: SortedLinkedList (){
first = NULL;
}
void SortedLinkedList :: insert (int key){
Data* data = new Data(key);
//
Data* prev = NULL;
Data* current = first;
while ( current != NULL && key > current>getData()) {
prev = current;
current = current->getNext();
}
if ( prev == NULL )
first = data;
else
prev->setNext(data);
data->setNext(current);
}
void SortedLinkedList :: remove (int key) {
Data* data = new Data(key);
//
Data* prev = NULL;
Data* current = first;
while ( current != NULL && key != current>getData()) {
prev = current;
current = current->getNext();
}
if ( prev == NULL)
first = first->getNext();
else
prev->setNext(current->getNext());
}
Data* SortedLinkedList :: find (int key) {
Data* current = first;
while ( current != NULL && current->getData()
<= key) {
if ( current->getData() == key )
return current;
current = current->getNext();
}
return NULL;
}
ostream& operator << (ostream& ostr , const
SortedLinkedList& list) {
ostr << "Content: ";
Data* current = list.first;
while ( current != NULL ) {
ostr << current->getData() << " ";
current = current->getNext();
}
ostr << endl;
return ostr;
}
hashTable[hash]->insert(key);
}
void HashTable :: remove (int key){
int hash = hashFunction(key);
hashTable[hash]->remove(key);
}
Data* HashTable :: find (int key){
int hash = hashFunction(key);
return hashTable[hash]->find(key);
}
int HashTable :: hashFunction(int key) {
return key % size;
}
ostream& operator << (ostream& ostr , const
HashTable& tbl ) {
class HashTable {
private:
vector<SortedLinkedList*> hashTable;
int size;
public :
HashTable (int size);
//
void insert (int key);
void remove (int key);
Data* find (int key);
//
friend ostream& operator << (ostream& ostr ,
const HashTable& tbl);
private :
int hashFunction (int key);
};
HashTable :: HashTable (int size) {
hashTable.resize(size);
this->size = size;
//
for (int i=0; i<size; i++)
hashTable[i] = new SortedLinkedList;
}
void HashTable :: insert (int key) {
int hash = hashFunction(key);
for (int i=0; i<tbl.size; i++){
if ( tbl.hashTable[i] != NULL )
cout << i << " : " << *tbl.hashTable[i] << " ";
else
cout << "** ";
}
cout << endl;
return (ostr);
}
int menu () {
cout << "[1] insert value " << endl
<< "[2] find value " << endl
<< "[3] remove value " << endl
<< "[4] display table" << endl
<< "[5] exit " << endl
<< " >> ";
int choice;
cin >> choice;
return choice;
}
int main (int argc , char* argv[]) {
cout << "Enter Size of hash : ";
int size , key;
cin >> size;
//
HashTable hashTable(size);
//
time_t aTime;
srand( static_cast<unsigned>(time(&aTime)));
for (int i=0; i<size/2; i++)
hashTable.insert( rand() % (100*size) );
//
bool state = true;
while ( state ) {
switch (menu()) {
case 1 :
cout << "Enter key to insert: ";
cin >> key;
hashTable.insert(key);
break;
case 2:
cout << "Enter key to find: ";
cin >> key;
if ( hashTable.find(key) != NULL )
cout << "Found key " << key << endl;
else
cout << "cannot found " << key << endl;
break;
case 3:
cout << "Enter key to remove : ";
cin >> key;
hashTable.remove(key);
break;
case 4:
cout << hashTable << endl;
break;
case 5 :
state = false;
break;
}
}
return (0);
}
IV. CONCLUSIONS
In this paper we introduced the idea if hash table
and an application using it. Hash table is one of the
fastest data structures which is used when needing
for quick processing for searching and input. This
data structure uses hash function which convert
input values into other values, index in the table.
When a collision is happened, it can be solved
using ‘open addressing’ method or ‘separate
chaining’. In case of using the first method, then
double hashing method is efficient unless the table
doesn’t change and no input operations after
creation where ‘linear hashing’ will be better. The
second method ‘separate chaining’ is in general
better than the first method. It is not related to ‘load
factor and any operations can be easily done after
creating table.
REFERENCES
[1]
[2]
[3]
R. G. Dromey, How to solve it by computer, Ed. Dorling Kindersley, India.
Arab alg. website http://www.aralg.com/vb/showthread.php?t=110831
(2013)
Wikipedia,
the
free
encyclopedia
website,
[online]
http://en.wikipedia.org/wiki/ Hash_function