* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sets, Maps and Hash tables
Survey
Document related concepts
Transcript
Sets, Maps and Hash Tables
Sets
• We have learned that different data structures have different advantages – and
drawbacks
• Choosing the proper data structure
depends on typical usage patterns
• Array- and list-oriented data structures are
appropriate when the order of elements
matter – but that is not always the case
RHS – SOC
2
Sets
• A Set is a data structure which can hold an
unordered collection of elements
• Not having to worry about ordering can
improve performance of other operations
• On a Set, we want to be able to
– Insert an element
– Delete an element
– Check if a given element is in the Set
RHS – SOC
3
Sets
public interface Set<T>
{
void add(T element);
void remove();
boolean contains(T element);
Iterator<T> iterator();
}
RHS – SOC
4
Sets
• It turns out that insertion, deletion and
check for containment can be done in
O(log(n)), or even faster!
• Depends on the underlying implementation of the interface
• In Java, implementation is either
– HashSet (based on Hash Tables)
– TreeSet (based on Trees)
RHS – SOC
5
Sets
• A Set iterator is ”simpler” than e.g. a List
iterator
– Elements will occur in ”random” order
– No add method – we just call add on the Set
itself
– No previous method – does not make sense
• The Set iterator does however have a
delete method (why?)
RHS – SOC
6
Sets – Quality tip
• When using a Set, we must choose a specific implementation (HashSet or TreeSet)
• However, the definition should look like:
Set<Car> cars = new HashSet<Car>();
RHS – SOC
7
Sets – Quality tip
Set<Car> cars = new HashSet<Car>();
• Why…? We should in general only refer to
the interface, not the implementation
• Easy to switch implementation!
RHS – SOC
8
Maps
• A Map is a data structure which stores
associations between
– A collection of keys
– A collection of values
• All keys map to a value
• Keys are unique (values are not)
RHS – SOC
9
Maps
K1
V1
K2
V2
K3
V3
K4
RHS – SOC
10
Map
public interface Map<K,V>
{
void put(K key,V value);
V get(K key);
void remove(K key);
Set<K> keySet();
}
RHS – SOC
11
Map
• The keySet method
returns a Set containing
all keys in the Map
• You must then iterate
through this Set, in order
to get all values stored in
the Map
RHS – SOC
12
Map
Map<String,Car> carMap = new HashMap<String,Car>();
...
Set<String> regNumbers = carMap.keySet();
for (String regNo : regNumbers)
{
Car aCar = carMap.get(regNo);
... // Do something with the Car object
}
RHS – SOC
13
Exercises
• Review: R16.1, R16.4, R16.6
• Programming: P16.4, P16.12
RHS – SOC
14
Hash Tables
• A Set and a Map are both abstract data
types – we need a concrete implementation in order to use them
• In the Java library, two implementations
are available:
– Sets: HashSet, TreeSet
– Maps: HashMap, TreeMap
RHS – SOC
15
Hash Tables
• The implementations HashSet and
HashMap are based on a Hash Table
• A Hash Table is based on the below ideas:
– Create an array of length N, which can store
objects of some type T
– Find a mapping from T to the interval [0; N-1]
(a Hash Function f)
– Store an object t of type T in the position f(t)
RHS – SOC
16
Hash Tables
f(Car1) = 3
Car3
Car1
0
f(Car2) = 0
Car2
1
f(Car3) = 2
2
3
RHS – SOC
4
17
Hash Tables
• A Hash Table is thus ”almost” an array
• Instead of having an index directly
available, we must calculate it
• If calculation can be done in constant time,
then all basic operations (insert, delete,
lookup) can be done in constant time!
• Better than tree-based implementations,
which have O(log(N))
RHS – SOC
18
Hash Tables
• However, there are
some issues:
– How do we define a
good mapping from the
objects to [0; N-1]?
– What happens if we try
to store two objects at
the same position?
RHS – SOC
19
Hash Functions
• Before finding a good mapping – i.e. a
good hash function – we must consider
the size of the array
• For good performance, the array should at
least be as large as the maximal number
of objects stored
• Rule of thumb is about 30 % larger
• Size should be a prime number (???)
RHS – SOC
20
Hash Functions
• What if the expected number of objects is
unknown in advance?
• We can expand a hash table dynamically
• If the hash table in running out of space,
double the capacity
• Start out with a reasonably large array
(space is cheap…)
RHS – SOC
21
Hash Functions
• Having handled the choice of N, how do
we define a proper hash function?
• Properties of a hash function:
– Must map all objects of type T to the interval
[0; N-1]
– Should map objects as uniformly as possible
to the interval [0; N-1]
RHS – SOC
22
Hash Functions
• We can enforce the mapping to [0;N-1] by
using the modulo operator:
f(t) = g(t) % N
• g(t) can then produce any integer value
• How do we achieve a uniform distribution?
• Theory for this is complicated, but there
are some general rules to follow
RHS – SOC
23
Hash Functions
• A good hash function
should be ”almost random”, but deterministic
– ”Almost random” –
values are well distributed in the interval
– Deterministic – always
produce the same output
for the same input
RHS – SOC
24
Hash Functions
• In Java, all objects have a hashCode
method
– Defined in Object class
– Can be overrided
– Returns an integer (the Hash Code)
– We must use modulo on the value ourselves
RHS – SOC
25
Hash Functions
• Hash function for integers:
– The number itself…
• Hash function for strings:
final int HASH_MULTIPLIER = 31;
int h = 0;
for (int i = 0; i < s.length; i++)
h = (HASH_MULTIPLIER * h) + s.charAt(i);
RHS – SOC
26
Hash Functions
• Hash code for an object
can be calculated by
combining hash codes
for instance fields
• Combine values in a
way similar to the
algorithm used to find
string hash codes
RHS – SOC
27
Hash Functions
public int hashCode()
{
final int MULTIPLIER = 31;
int h1 = regNo.hashCode();
int h2 = mileage;
int h3 = model.hashCode();
int h = h1*MULTIPLIER + h2;
h = h*MULTIPLIER + h3;
return h;
}
RHS – SOC
28
Hash Functions
• But wait…what about numeric overflow?
• We multiply a ”random” integer value with
a number…?
• Does not really matter…
• As long as the algorithm is deterministic,
overflow is not a problem
• Just helps ”scrambling” the value
RHS – SOC
29
Hash Functions
• Common pitfalls:
– Remember to define a hashCode function
– If you forget, the hashCode implementation in
Object is used
– Based solely on memory location of object
– Two objects with the same value of instance
fields will produce different hash codes…
RHS – SOC
30
Hash Functions
• Common pitfalls:
– The hashCode function must be
”compatible” with your equals function
– If a.equals(b) it must hold that
a.hashCode() == b.hashCode()
– If not, duplicates are allowed!
– The reverse condition is not required; two
different objects may have the same hash
code
RHS – SOC
31
Hash Functions
• In general, you must
remember to:
– Either define the
hashCode and the
equals method
– Or not define any of
them!
RHS – SOC
32
Handling collisions
• Even with a good hash function, we will
still experience collisions
• Collision: two different objects t1 and t2
have the same hash code
• We will then try to store both objects in the
same position in the array
• Now what…?
RHS – SOC
33
Handling collisions
• What we store in each position in the array
is not the objects themselves, but a linked
list of objects
• Objects with the same hash code h are
stored in the linked list in position h
• With a good hash function, the average
length of non-empty lists is less than 2
RHS – SOC
34
Handling collisions
Car6
Car4
Car2
0
Car3
1
2
Car1
3
RHS – SOC
Car5
4
35
Handling collisions
• Basic operations (insert, delete, lookup)
follow this structure:
– Calculate hash code for the object
– Find the corresponding position in the array
• Insert: Insert element at the end of list
• Delete/Lookup: Iterate through list until element is
found, or end of list is reached
RHS – SOC
36
Handling collisions
• Basic operations are thus not done in truly
constant time
• However, if a proper hash function is used,
running time is constant in practice
• Use hash-based implementations unless
special circumstances apply
– Hard to define hash/equals function
– More functionality required
RHS – SOC
37
Exercises
• Review: R16.8, R16.10
• Programming: P16.6
RHS – SOC
38