Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu Arrays Efficient access of data. Access by index. Mapping between search keys and indices allows each data to be stored in the array element with the corresponding index. Example There are 500 students in a school. Each student has their own TDSB nine-digit student number. If we want to assign an ID to each student name, we could use their student number. However, if the greatest student number is “351000005”, there would be 351,000,005 elements in the array. This is a lot more than what is required to store the names of 500 students. Solution: Mapping between the student numbers and the numbers from 0 to 499. By using arithmetic operations on keys, we can map them onto table addresses. Advantage: Direct referencing. Mapping Methods for mapping: Direct address table Hash table Hash table – a data structure that uses a hash function to efficiently map certain identifiers or keys (i.e. persons’ names) to associated values (i.e. their telephone numbers). A hash table is made up of two parts: An array (the actual table where the data to be searched is stored) A mapping function, a.k.a. hash function. Hash Function Hash function – a function that transforms the search key into a table address. Different hash functions use different arithmetic operations to do this. We will focus on the modulo arithmetic. Hash Function Modulo Arithmetic Numbers as keys Address = search key % size of array Pseudocode - Number get number address = key % size of array Strings as keys Take the binary representation of a key as a number and then apply the first case. In general the arithmetic operations in such expressions will use 32-bit modular arithmetic ignoring overflow. For example: Integer.MAX_VALUE + 1 = Integer.MIN_VALUE where Integer.MAX_VALUE = 2147483647 Integer.MIN_VALUE = -2147483648 Example Char Unicode h e l l o 104 101 108 108 111 104*314 + 101*313 + 108*312 + 108*311 + 111*310 = 99162322 To prevent overflow, we can apply Horner’s method: anxn + an-1·xn-1 + an-2·xn-2 + … + a1x1 + a0x0 = x(x(…x(x (an·x +an-1) + an-2) + ….) + a1) + a0 99162322 = (((104*31 + 101)31 + 108)31 + 108)31 + 111 We compute the hash function by applying the mod (%) operation at each step, thus avoiding overflowing. Compute h0 = (22*32 +5) % N Compute h1 = (32*h0 + 18) % N Compute h2 = (32*h1 +25) % N Etc. Pseudocode - String get string loop (for as many as the number of characters in the string, each time with a different character of the string) { address = (31*address + Unicode of character) % size of array } Hash Table How do we choose the size of the array (hash table)? Let N be the number of records to be stored. Let M be the size of the hash table. Ideally N records are stored in a hash table of size N. However... We may not have prior knowledge of exact number of records. It is possible to have two keys mapped to the same index (although this can be prevented). Hence, we assume that the size of the table (N) can be different from the number of records (M). Load factor – the ratio between N and M. Load factor L = N/M The default L value for Java is 0.75. Note: M should be a prime number to obtain more even distribution of keys over the table. Collision Resolution Collision – when two or more keys hash to the same index. Methods to resolve collisions: Separate chaining Open addressing Linear probing Quadric probing Double hashing Linear Probing Collision when inserting: Probe the next slot in the table. If unoccupied, store the key. If occupied, continue probing the next slot. Linear Probing - Collision Searching: If the key hashes to an occupied slot but does not match the key occupying the slot, probe the next slot. If slot is empty, search is unsuccessful. If slot is occupied: ○ If it does not match, search is unsuccessful. ○ If it matches, search is successful. When reaching the end of table, resume from the beginning. Disadvantages: Primary clustering – building up of large clusters Runs slowly for tables that are almost full Hash Table - Advantages Speed Especially with large number of entries (thousands or more). Efficient when maximum number of entries is predicted in advance. If the set of key-value pairs is fixed and known ahead of time (no insertions and deletions), average lookup cost can be reduced by a careful choice of the hash function, bucket table size, and internal data structures. Hash Tables - Disadvantages More difficult to implement than self-balancing binary trees. Difficult to create a perfect hash function. Insertion or deletion may take time proportional to number of entries. May not be suitable for real-time or interactive applications. Cost is significantly higher than sequential list or search tree even though operations take constant time on average. Not suitable for small number of entries.