Download Review – Hash Tables with Chaining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Comparison of programming languages (associative array) wikipedia , lookup

Java ConcurrentMap wikipedia , lookup

Bloom filter wikipedia , lookup

Control table wikipedia , lookup

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Transcript
Operating Systems (234123) – Winter 2011
(Homework Wet 3)
Homework Wet 3
Due date: Sunday, 02/01/2011, 12:30 noon
Teaching assistant in charge:
•
Michael Kuperstein
E-mails regarding this exercise should be sent only to the following email:
[email protected] with the subject line: cs234123hw3_wet.
Introduction
In "Data Structures 1" you have studied several hash table implementations. However
all of them had one thing in common – they could only be accessed sequentially by a
single thread. In this assignment you will implement a concurrent hash table – a hash
table that can be accessed and modified by several threads simultaneously.
Review – Hash Tables with Chaining
A hash table is a data structure designed to support three operations in time O(1) :
lookup, insertion and deletion. The hash table itself is an array of m slots (also called
buckets), and uses a hash function f() to map elements into those slots. When an
element e is inserted into the hash table it is placed in the slot f(e). To look up whether
an element e is in the table, we only need to compute f(e) and check whether the
element is in that slot.
The problem with this very basic scheme is collisions. A collision happens when two
elements, e1 and e2 are hashed into the same slot. There are several ways to resolve this
problem, and in this exercise we will use "chaining". In this technique every slot in the
hash table contains a linked list of elements. For example, consider a hash table with 10
slots that contains integers, and the hash function f(e)=e%10 . Suppose the user inserts
5 elements with the values 1, 15, 31, 55 and 57. The resulting hash table will look like
this:
Operating Systems (234123) – Winter 2011
(Homework Wet 3)
Dynamic Splitting
In practice, the worst-case performance of a hash-table operation depends on the
maximum number of element in any one bucket. If some buckets get too large, we
would like to increase the number of buckets, thus getting a better distribution of the
elements and decreasing the number of elements in most buckets. In this exercise we
will use a very simple scheme to handle this issue.
Your hash table will have a predefined maximum number of elements per bucket, max.
Whenever an insert operation causes any bucket to have more than max elements, a
“split” operation is triggered: the number of buckets is doubled, and elements are
redistributed into the new buckets.
Interface
Your hash table must support the following operations:
Allocation:
hashtable_t* hash_alloc()
This function allocates a new hash table and returns a pointer to the newly allocated structure.
Dellocation:
void hash_free(hashtable_t* table)
This function frees a previously allocated hash table.
Initialization:
void hash_init(hashtable_t* table, int (hash)(int, int),
int buckets, int max)
This function initializes the hash table. The parameters passed to the function are:
Operating Systems (234123) – Winter 2011
(Homework Wet 3)




table – a pointer to a (already allocated) hash table structure.
buckets – the initial number of buckets in the hash table.
hash – the hash function the table will use. The first parameter to hash() is the
current number of buckets, and the second the value to be hashed.
max – the maximum number of elements in a bucket before a split is triggered.
Note that this function does not allocate the hashtable_t structure itself.
Destruction:
void hash_destroy(hashtable_t* table)
This function destroys the hash table. Note that this function should not free the structure
pointed to by the table pointer itself. It is the caller’s responsibility to call
hash_free(table) after destruction is complete.
Single-element insertion:
int hash_insert(hashtable_t* table, int val)
This function inserts a single element with the value val into the hash table. Note that if an
element already exists in the hash table, it should not be added again.
Return values:
• 1: The element has been inserted
• 0: The element has not been inserted because it is already in the table
• -1: Error
Single-element removal:
int hash_remove(hashtable_t* table, int val)
This function removes a single element with the value val from the hash table.
Return values:
• 1: The element has been removed
• 0: The element has not been removed because it was not in the table
• -1: Error
Single-element contains:
int hash_contains(hashtable_t* table, int val)
This function checks whether an element with the value val is in the hash table.
Operating Systems (234123) – Winter 2011
(Homework Wet 3)
Return values:
• 1: The element is in the hash table
• 0: The element is not in the hash table
• -1: Error
Bucket size query
int hash_getbucketsize(hashtable_t* table, int bucket)
This function returns the number of elements currently in bucket number bucket, or -1 on
error.
Batch operations:
void hash_batch(hashtable_t* table, int num_ops, op_t* ops)
Your hash table must support requests to perform several different operations concurrently.
The additional parameters passed to the hash_batch function are:
• num_ops – the number of operations in the batch
• ops - an array of pointers to op_t structures, where each op_t structure represents a
single-element operation. The definition of this structure is below:
typedef struct op_t
{
int val;
enum {INSERT, REMOVE, CONTAINS} op;
int result;
} op_t;
All operations in the batch must run concurrently. This means every operation must be executed
by a different thread. All of those threads must be created as early as possible – you should not
wait for some operations in a batch to complete before starting others.
The return value of each operation should be written to the result field in the corresponding
op_t structure. The hash_batch function returns only after all operation in the batch
complete.
The header file hashtable.h that defines those required functions and data structures can
be found on the course website.
Implementation Requirements
All of the above operations may be called concurrently. At any given moment, there might be
several single-element operations concurrently with several batch operations, each of those
batch operations itself running many single-element operations.
Operating Systems (234123) – Winter 2011
(Homework Wet 3)
Operations involving different elements should not wait for each other unless this is strictly
necessary (except “stop-the-world” operations, as described later). This means that you cannot
use a single lock for the entire hash table. Furthermore, since different values may be hashed
into the same bucket, you cannot use a single lock for every bucket. Therefore you must have a
separate lock for each element in the hash table. Implementations that lock the entire table or
bucket to insert or remove a value will not be accepted.
“Stop-the-world” operations
There are two special operations that are executed alone, without any interference from
concurrent operations on the hash table. In a sense they “stop the world”, because when one of
them is executing no new operation can begin.
1. Destruction: the hash_destroy() function should wait for all executing operations
to complete, but once destruction has started all new operations must fail.
2. Splitting: Because splitting is a complex operation that might affect the whole table it
should be allowed to run without interference. However we do not want to delay splits
any more than necessary. Therefore, once the need to split has been detected:
o All operations already in progress are allowed to finish.
o Any future operations are delayed until the split is done.
Notes
•
•
•
•
•
•
•
•
You must implement this exercise in C. Java/C++/Assembly are prohibited.
Use only standard POSIX threads and synchronization functions.
Do not change the provided header file.
We only require you to submit the implementation of your hash table, but not any test
programs that use the data structure. Of course, this does not mean you will not
need a test program for debugging!
Insertion and removal from a linked-list that has a lock for each element are not trivial
to implement correctly. In particular, you might have to acquire several locks to
perform an operation. Think before you start working!
It is possible that after a split is performed some buckets still contain too many
elements. This should trigger another split.
On the other hand, it is possible that between the moment the need to split was
detected and the moment the split can actually begin (all pending operations
finished) the maximum bucket size became smaller than the limit. In this case, the
split should not take place.
You should try to make your implementation as concurrent and as efficient as possible.
The performance of your solution can affect your grade. However, efficiency must
not come at the cost of correctness!
Operating Systems (234123) – Winter 2011
(Homework Wet 3)
Submission
•
•
•
You submission should consist of two parts: an electronic and a printed submission.
The printed submission should contain the documentation of your design. In it we
expect you to describe the way you used POSIX thread and synchronization
primitives to achieve the goal of maximum concurrency.
The electronic submission should contain the implementation of your hash table.
You should create a zip file containing all the files that you have used. Make sure
your zip file contains these files without any directories in it.
1. All source and header files that are part of your implementation.
2. A makefile named "Makefile" that creates a library against which we can link
our test code. The library should be compiled as a static library. The library
filename should be libhash.a. It should include the implementation of the
functions defined in the Interface section. If you want to create a static library to
contain several object files you can do it in the following way:
ar rcs libhash.a obj1.o obj2.o
3. A file named submitters.txt which includes the ID, name and email of the
participating students. The following format should be used:
Bill Gates [email protected] 123456789
Linus Torvalds [email protected] 234567890
Steve Jobs jobs@os_is_best.com 345678901
Important Note: Make the outlined zip structure exactly. In particular, the zip should contain
files (no directories!). You can create the zip by running (inside VMware):
zip final.zip hashtable.h <your source and header files> submitters.txt
The zip should look as follows:
zipfile -+
|
+- Makefile
|
+- hashtable.h
|
+- …
|
+- submitters.txt