Download Datastructures1:Lists, Sets and Hashing

Document related concepts

Java ConcurrentMap wikipedia , lookup

Comparison of programming languages (associative array) wikipedia , lookup

B-tree wikipedia , lookup

Linked list wikipedia , lookup

Array data structure wikipedia , lookup

Transcript
Welcome to CIS 068 !
Lesson 10:
Data Structures
CIS 068
Overview
Description, Usage and JavaImplementation of
Collections
Lists
Sets
Hashing
CIS 068
Definition
Data Structures
Definition (www.nist.gov):
“An organization of information, usually in
memory, for better algorithm efficiency,
such as queue, stack, linked list, heap,
dictionary, and tree, or conceptual unity,
such as the name and address of a
person.”
CIS 068
Efficiency
“An organization of information …for better
algorithm efficiency...”:
Isn’t the efficiency of an algorithm defined by
the order of magnitude O( )?
CIS 068
Efficiency
Yes, but it is dependent on its implementation.
CIS 068
Introduction
• Data structures define the structure of a
collection of data types, i.e. primitive data
types or objects
• The structure provides different ways to
access the data
• Different tasks need different ways to
access the data
• Different tasks need different data
structures
CIS 068
Introduction
Typical properties of different structures:
• fixed length / variable length
• access by index / access by iteration
• duplicate elements allowed / not allowed
CIS 068
Examples
Tasks:
• Read 300 integers
• Read an unknown number of integers
• Read 5th element of sorted collection
• Read next element of sorted collection
• Merge element at 5th position into collection
• Check if object is in collection
CIS 068
Examples
Although you can invent any datastructure
you want, there are ‘classic structures‘,
providing:
• Coverage of most (classic) problems
• Analysis of efficience
• Basic implementation in modern languages,
like JAVA
CIS 068
Data Structures in JAVA
Let‘s see what JAVA has to offer:
CIS 068
The Collection Hierarchy
Collection: top interface, specifying requirements for
all collections
CIS 068
Collection Interface
CIS 068
Collection Interface
!
CIS 068
Iterator Interface
Purpose:
• Sequential access to collection elements
• Note: the so far used technique of sequentially accessing
elements by sequentially indexing is not reasonable in
general (why ?) !
Methods:
CIS 068
Iterator Interface
Iterator points ‘between‘ the elements of collection:
1
2
3
4
5
first position,
Returned element
hasNext() = true,
remove() throws
error
Current position
(after 2 calls to
next() ),
remove() deletes
element 2
Position after next()
hasNext() = false
CIS 068
Iterator Interface Usage
Typical usage of iterator:
CIS 068
Back to Collections
AbstractCollection
CIS 068
AbstractCollection
• Facilitates implementation of Collection
interface
• Providing a skeletal implementation
• Implementation of a concrete class:
• Provide data structure (e.g. array)
• Provide access to data structure
CIS 068
AbstractCollection
• Concrete class must provide implementation of
Iterator
• To maintain ‘abstract character‘ of data in
AbstractClass implemented (non abstract)
methods use Iterator-methods to access data
AbstractCollection
add(){
Iterator i=iterator();
…
}
Clear(){
Iterator i=iterator();
…
}
myCollection
implements Iterator;
int[ ] data;
Iterator iterator(){
return this;
}
hasNext(){
…
}
…
CIS 068
Back to Collections
List Interface
CIS 068
List Interface
• Extends the Collection Interface
• Adds methods to insert and retrieve objects
by their position (index)
• Note: Collection Interface could NOT
specify the position
• A new Iterator, the ListIterator, is introduced
• ListIterator extends Iterator, allowing for
bidirectional traversal (previousIndex()...)
CIS 068
List Interface
Incorporates
index !
A new
Iterator Type
(can move forward
and
backward)
CIS 068
Example: Selection-Sorting a List
Part 1: call to selection
sort
Actual implementation
of List does not
matter !
Call to SelectionSort
Use only Iteratorproperties of
ListIterator
(upcasting)
CIS 068
Example: Selection-Sorting a List
Part 2:
Selection sort
access at index ‘fill‘
Inner loop
swap
CIS 068
Back to Collections
AbstractList: ...again the implementation of
some methods...
Note:
Still ABSTRACT !
CIS 068
Concrete Lists
ArrayList and Vector:
at last concrete implementations !
CIS 068
ArrayList and Vector
Vector:
• For compatibility reasons (only)
• Use ArrayList
ArrayList:
• Underlying DataStructure is Array
• List-Properties add advantage over Array:
• Size can grow and shrink
• Elements can be inserted and removed in the middle
CIS 068
An Alternative Implementation (1)
CIS 068
An Alternative Implementation (2)
CIS 068
An Alternative Implementation (3)
CIS 068
Collections
The underlying array-datastructure has
• advantages for index-based access
• disadvantages for insertion / removal of middle
elements (copy), insertion/removal with O(n)
• Alternative: linked lists
CIS 068
Linked List
Flexible structure, providing
• Insertion and removal from any place in O(1),
compared to O(n) for array-based list
• Sequential access
• Random access at O(n), compared to O(1) for
array-based list
CIS 068
Linked List
• List of dynamically allocated nodes
• Nodes arranged into a linked structure
• Data Structure ‘node‘ must provide
• Data itself (example: the bead-body)
• A possible link to another node (ex.: the link)
Children’s pop-beads as an example for a linked list
CIS 068
Linked List
New node next
Old node next
(null)
CIS 068
Connecting Nodes
creating the nodes
connecting
CIS 068
Inserting Nodes
r
p.link = r
r.link = q
q can be accessed by p.link.link
CIS 068
Removing Nodes
p
q
CIS 068
Traversing a List
(null)
CIS 068
Double Linked Lists
Single linked list
(null)
Double linked list
(null)
data
successor
predecessor
(null)
data
successor
predecessor
data
successor
predecessor
(null)
CIS 068
Back to Collections
AbstractSequentialList and LinkedList
CIS 068
LinkedList
An implementation example:
See textbook
CIS 068
Sets
Example task:
Examine, collection contains object o
Solution using a List:
-> O(n) operation !
CIS 068
Sets
Comparison to List:
• Set is designed to overcome the limitation of O(n)
• Contains unique elements
• contains() / remove() operate in O(1) or O(log n)
• No get() method, no index-access...
• ...but iterator can (still) be used to traverse set
CIS 068
Back to Collections
Interface Set
CIS 068
Hashing
How can method ‘contain()‘ be implemented to be
an O(1) operation ?
http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html
CIS 068
Hashing
How can method ‘contain()‘ be implemented to be
an O(1) operation ?
Idea:
• Retrieving an object of an array can be done in
O(1) if the index is known
• Determine the index to store and retrieve an
object by the object itself !
CIS 068
Hashing
Determine the index ... by the object itself:
Example:
Store Strings “Apu“, “Bob“, “Daria“ as Set.
Define function H: String -> integer:
• Take first character, A=1, B=2,...
Store names in String array at position H(name)
CIS 068
Hashing
Apu:
first character:
A H(A) = 1
Bob:
first character:
B H(B) = 2
Daria:
first character:
D H(D) = 4
...
Apu
Bob
(unused)
Daria
(unused)
…
CIS 068
Hashing
• The Function H(o) is called the HashCode of the object o
• Properties of a hashcode function:
• If a.equals(b) then H(a) = H(b)
• BUT NOT NECESSARILY VICE VERSA:
• H(a) = H(b) does NOT guarantee a.equals(b) !
• If H() has ‘sufficient variation‘, then it is most likely, that
different objects have different hashcodes
CIS 068
Hashing
• Additionally an array is needed,
that has sufficient space to
contain at least all elements.
• The hashcode may not address
an index outside the array, this
can easily be achieved by:
• H1(o) = H(o) % n
• % = modulo-function, n =
array length
• The larger the array, the more
variates H1() !
Apu
Bob
(unused)
Daria
(unused)
…
CIS 068
Hashing
Back to the example:
Insert ‘Abe‘
First character:
A
H(A) = 1
H(Apu) = H(Abe), this is called a
Collision
Apu
Bob
(unused)
Daria
(unused)
…
CIS 068
Solving Collisions
Method 1:
Don‘t use array of objects, but arrays of linked lists !
Apu
Abe
Bob
(unused)
Daria
Array contains (start of) linked lists
(unused)
ARRAY
CIS 068
Solving Collisions
Drawback:
• Objects must be ‘wrapped‘ in node structure, to provide
links, introducing a huge overhead
’Apu’
wrap
’Apu’
link
Node
CIS 068
Solving Collisions
Method 2:
• Iteratively apply different hashcodes H0, H1, H2,.. to object
o, until collision is solved
Apu
H0
Bob
Apu
H1
H2
• As long as the different hashcodes
(unused)
Daria
(unused)
are used in the same order, the
search is guaranteed to be
ARRAY
consistent
CIS 068
Solving Collisions
The easiest hashcode-series Hinc:
H(0) = H
Hi = Hi-1 + i
Apu
H0
Apu
H1
H2
Bob
(unused)
Daria
http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html
(unused)
ARRAY
CIS 068
add
Example implementation of ‘add(Object o)‘ using Hinc
(assume array A has length n, H as given above)
determine index = H(o) % n
while ( A[index] != null )
if o.equals(A[index])
break;
else
index = (index +1) % n;
end
}
add element at position a[index]
CIS 068
contains
Example implementation of ‘contains(Object o)‘ using Hinc
(assume array A has length n, H as given above)
determine index = H(o) % n
found = false;
while ( A[index] != null )
if o.equals(A[index])
found = true;
break;
else
index = (index +1) % n;
end
}
// ‘found‘ is true if set contains object o
CIS 068
Analysis
• If there is no collision, contains() operates in O(1)
• If the set contains elements having the same hashcode,
there is a collision. Being dupmax the maximum value of
elements having the same hash code, contains() operates
in O(dupmax)
• If dupmax is near n, there is no increase in speed, since
contains() operates in O(n)
CIS 068
A Real Hashcode
• JAVA provides a hashcode for every object
Method hashCode in java.lang.Object
• The implementation for hashCode for e.g. String is
computed by:
S[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
n = length of string, s[i] = character at position i
CIS 068
Rehashing a table
What happens if the array is full ?
• Create new array, e.g. double size, and insert all elements
of old table into new table
• Note: the elements won‘t keep their index, since the
modulo-function applied to the hashing has changed !
CIS 068
Hashcode Resume
• Hashtable provides Set-operations add(),
contains() in O(1) if hashcode is chosen
properly and array allows for sufficient
variation
• Speed is gained by usage of more memory
• If multiple collisions occur, hashtable might
be slower than list due to overhead
(computation of H,...)
CIS 068