Download CSC 2500 Computer Organization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Lattice model (finance) wikipedia , lookup

Control table wikipedia , lookup

Binary search tree wikipedia , lookup

Comparison of programming languages (associative array) wikipedia , lookup

B-tree wikipedia , lookup

Bloom filter wikipedia , lookup

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Transcript
CSC 2300
Data Structures & Algorithms
February 27, 2007
Chapter 5. Hashing
Today – Splay Trees and Hashing



Splay Trees – Delete a Node
Hashing – Overview
Hashing – Separate Chaining
Splay Tree – Delete a Node









How do we delete a node?
First, access the node.
This puts the node at the root.
After we delete the root, we get TL and TR.
Which node should we select as the new root?
The largest element in TL.
Find this element (easy), and rotate it to the root
position of TL.
Will this new root of TL have a right child?
If not, what can we do?
Examples in Class

We will illustrate the idea with a few examples in class:
http://webpages.ull.es/users/jriera/Docencia/AVL/AVL%20tree%20applet.htm
Zig-zag and Zag-zig

The web site uses a different (and perhaps more intuitive) way to describe
the actions of Zig-zag and Zag-zig. We will use the definitions in the text.
Zig-zag as in text:

The web site calls it zag-zig – we do a zag first, and then a zig.

Hashing – Overview
Hash Functions
Integer Input Keys




If the input keys are integers, then a good strategy
is to compute
key mod tableSize.
When may this strategy become a bad choice?
As an example, let tableSize=10. Can you suggest
a sequence of input keys that will all be mapped to
the same cell?
What is then a good choice for tableSize?
String Input Keys







Choose tableSize as a prime number – a good choice according to
the previous slide.
Since memory is cheap, construct a large table.
Let tableSize=10,007.
Suppose that all keys are eight or fewer characters long.
Use this hash function:
What can happen to the hash table?
Hint. What is the maximum integer value of an ASCII character?
String Input Keys







So, we want hashVal to be large – greater than 10,007.
Multiplication may be more appropriate than addition.
Try this hash function:
This function uses the first three characters of the input string.
We can check that 26 x 272 > 10,007.
Why do we use 26 x 272 and not 263 ?
What may go wrong with distribution in this hash table?
String Input Keys

Include also the ten numbers 0 to 9.
Get this hash function:

This function uses Horner’s rule. What is it?

Horner’s Rule





Problem 2.14 in text.
Evaluate f(x) = anxn + an-1xn-1 + … + a2x2 + a1x + a0
Code:
poly = 0;
for( i=n; i>=0; i-- )
poly = x * poly + a[i];
Why does this algorithm work?
What is its running time?
Compromises







A hash function needs not be the best with respect to table
distribution.
But it should be simple and reasonably fast.
If the keys are very long, the hash function will take too long to
compute.
A common practice is not to use all the characters.
As an example, consider a complete street address.
The hash function may include only a couple of characters
from the street address, and a couple of characters from the
city name and the zip code.
The idea is that the time saved in computing the hash function
will make up for a slightly less evenly distributed function.
Collision Resolution


If, when an element is inserted, it hashes to the
same value as an already inserted element, then
we have a collision and need to resolve it.
There are several methods for dealing with collision,
we will discuss the simplest approach: separate
chaining.
How to Handle Collisions
Separate Chaining


Keep a list of all elements that hash to the same value.
Example. The keys are the first 10 perfect squares and the
hash function is
hash(x) = x mod 10.
Hash Table with Separate Hashing
Example
Discussion








Another data structure could be used to resolve the collisions;
for example, binary search trees.
Why do we use linked lists instead?
We define the load factor, λ, of a hash table to be the ratio of the
number of elements in the table to the table size.
The average length of a list is λ.
The effort to perform a search is the constant time required to
evaluate the hash function plus the time to traverse the list.
In an unsuccessful search, what is the average number of nodes
to examine?
In a successful search, what is the average number of nodes to
examine?
Why 1+(λ/2) and not (λ/2)?
Hashing Applet

In class, we will run some hashing examples using this web site:
http://www.engin.umd.umich.edu/CIS/course.des/cis350/hashing/WEB/HashApplet.htm