Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Operating Systems (234123) – Winter 2011 (Homework Wet 3) Homework Wet 3 Due date: Sunday, 02/01/2011, 12:30 noon Teaching assistant in charge: • Michael Kuperstein E-mails regarding this exercise should be sent only to the following email: [email protected] with the subject line: cs234123hw3_wet. Introduction In "Data Structures 1" you have studied several hash table implementations. However all of them had one thing in common – they could only be accessed sequentially by a single thread. In this assignment you will implement a concurrent hash table – a hash table that can be accessed and modified by several threads simultaneously. Review – Hash Tables with Chaining A hash table is a data structure designed to support three operations in time O(1) : lookup, insertion and deletion. The hash table itself is an array of m slots (also called buckets), and uses a hash function f() to map elements into those slots. When an element e is inserted into the hash table it is placed in the slot f(e). To look up whether an element e is in the table, we only need to compute f(e) and check whether the element is in that slot. The problem with this very basic scheme is collisions. A collision happens when two elements, e1 and e2 are hashed into the same slot. There are several ways to resolve this problem, and in this exercise we will use "chaining". In this technique every slot in the hash table contains a linked list of elements. For example, consider a hash table with 10 slots that contains integers, and the hash function f(e)=e%10 . Suppose the user inserts 5 elements with the values 1, 15, 31, 55 and 57. The resulting hash table will look like this: Operating Systems (234123) – Winter 2011 (Homework Wet 3) Dynamic Splitting In practice, the worst-case performance of a hash-table operation depends on the maximum number of element in any one bucket. If some buckets get too large, we would like to increase the number of buckets, thus getting a better distribution of the elements and decreasing the number of elements in most buckets. In this exercise we will use a very simple scheme to handle this issue. Your hash table will have a predefined maximum number of elements per bucket, max. Whenever an insert operation causes any bucket to have more than max elements, a “split” operation is triggered: the number of buckets is doubled, and elements are redistributed into the new buckets. Interface Your hash table must support the following operations: Allocation: hashtable_t* hash_alloc() This function allocates a new hash table and returns a pointer to the newly allocated structure. Dellocation: void hash_free(hashtable_t* table) This function frees a previously allocated hash table. Initialization: void hash_init(hashtable_t* table, int (hash)(int, int), int buckets, int max) This function initializes the hash table. The parameters passed to the function are: Operating Systems (234123) – Winter 2011 (Homework Wet 3) table – a pointer to a (already allocated) hash table structure. buckets – the initial number of buckets in the hash table. hash – the hash function the table will use. The first parameter to hash() is the current number of buckets, and the second the value to be hashed. max – the maximum number of elements in a bucket before a split is triggered. Note that this function does not allocate the hashtable_t structure itself. Destruction: void hash_destroy(hashtable_t* table) This function destroys the hash table. Note that this function should not free the structure pointed to by the table pointer itself. It is the caller’s responsibility to call hash_free(table) after destruction is complete. Single-element insertion: int hash_insert(hashtable_t* table, int val) This function inserts a single element with the value val into the hash table. Note that if an element already exists in the hash table, it should not be added again. Return values: • 1: The element has been inserted • 0: The element has not been inserted because it is already in the table • -1: Error Single-element removal: int hash_remove(hashtable_t* table, int val) This function removes a single element with the value val from the hash table. Return values: • 1: The element has been removed • 0: The element has not been removed because it was not in the table • -1: Error Single-element contains: int hash_contains(hashtable_t* table, int val) This function checks whether an element with the value val is in the hash table. Operating Systems (234123) – Winter 2011 (Homework Wet 3) Return values: • 1: The element is in the hash table • 0: The element is not in the hash table • -1: Error Bucket size query int hash_getbucketsize(hashtable_t* table, int bucket) This function returns the number of elements currently in bucket number bucket, or -1 on error. Batch operations: void hash_batch(hashtable_t* table, int num_ops, op_t* ops) Your hash table must support requests to perform several different operations concurrently. The additional parameters passed to the hash_batch function are: • num_ops – the number of operations in the batch • ops - an array of pointers to op_t structures, where each op_t structure represents a single-element operation. The definition of this structure is below: typedef struct op_t { int val; enum {INSERT, REMOVE, CONTAINS} op; int result; } op_t; All operations in the batch must run concurrently. This means every operation must be executed by a different thread. All of those threads must be created as early as possible – you should not wait for some operations in a batch to complete before starting others. The return value of each operation should be written to the result field in the corresponding op_t structure. The hash_batch function returns only after all operation in the batch complete. The header file hashtable.h that defines those required functions and data structures can be found on the course website. Implementation Requirements All of the above operations may be called concurrently. At any given moment, there might be several single-element operations concurrently with several batch operations, each of those batch operations itself running many single-element operations. Operating Systems (234123) – Winter 2011 (Homework Wet 3) Operations involving different elements should not wait for each other unless this is strictly necessary (except “stop-the-world” operations, as described later). This means that you cannot use a single lock for the entire hash table. Furthermore, since different values may be hashed into the same bucket, you cannot use a single lock for every bucket. Therefore you must have a separate lock for each element in the hash table. Implementations that lock the entire table or bucket to insert or remove a value will not be accepted. “Stop-the-world” operations There are two special operations that are executed alone, without any interference from concurrent operations on the hash table. In a sense they “stop the world”, because when one of them is executing no new operation can begin. 1. Destruction: the hash_destroy() function should wait for all executing operations to complete, but once destruction has started all new operations must fail. 2. Splitting: Because splitting is a complex operation that might affect the whole table it should be allowed to run without interference. However we do not want to delay splits any more than necessary. Therefore, once the need to split has been detected: o All operations already in progress are allowed to finish. o Any future operations are delayed until the split is done. Notes • • • • • • • • You must implement this exercise in C. Java/C++/Assembly are prohibited. Use only standard POSIX threads and synchronization functions. Do not change the provided header file. We only require you to submit the implementation of your hash table, but not any test programs that use the data structure. Of course, this does not mean you will not need a test program for debugging! Insertion and removal from a linked-list that has a lock for each element are not trivial to implement correctly. In particular, you might have to acquire several locks to perform an operation. Think before you start working! It is possible that after a split is performed some buckets still contain too many elements. This should trigger another split. On the other hand, it is possible that between the moment the need to split was detected and the moment the split can actually begin (all pending operations finished) the maximum bucket size became smaller than the limit. In this case, the split should not take place. You should try to make your implementation as concurrent and as efficient as possible. The performance of your solution can affect your grade. However, efficiency must not come at the cost of correctness! Operating Systems (234123) – Winter 2011 (Homework Wet 3) Submission • • • You submission should consist of two parts: an electronic and a printed submission. The printed submission should contain the documentation of your design. In it we expect you to describe the way you used POSIX thread and synchronization primitives to achieve the goal of maximum concurrency. The electronic submission should contain the implementation of your hash table. You should create a zip file containing all the files that you have used. Make sure your zip file contains these files without any directories in it. 1. All source and header files that are part of your implementation. 2. A makefile named "Makefile" that creates a library against which we can link our test code. The library should be compiled as a static library. The library filename should be libhash.a. It should include the implementation of the functions defined in the Interface section. If you want to create a static library to contain several object files you can do it in the following way: ar rcs libhash.a obj1.o obj2.o 3. A file named submitters.txt which includes the ID, name and email of the participating students. The following format should be used: Bill Gates [email protected] 123456789 Linus Torvalds [email protected] 234567890 Steve Jobs jobs@os_is_best.com 345678901 Important Note: Make the outlined zip structure exactly. In particular, the zip should contain files (no directories!). You can create the zip by running (inside VMware): zip final.zip hashtable.h <your source and header files> submitters.txt The zip should look as follows: zipfile -+ | +- Makefile | +- hashtable.h | +- … | +- submitters.txt