Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Welcome to CIS 068 ! Lesson 10: Data Structures CIS 068 Overview Description, Usage and JavaImplementation of Collections Lists Sets Hashing CIS 068 Definition Data Structures Definition (www.nist.gov): “An organization of information, usually in memory, for better algorithm efficiency, such as queue, stack, linked list, heap, dictionary, and tree, or conceptual unity, such as the name and address of a person.” CIS 068 Efficiency “An organization of information …for better algorithm efficiency...”: Isn’t the efficiency of an algorithm defined by the order of magnitude O( )? CIS 068 Efficiency Yes, but it is dependent on its implementation. CIS 068 Introduction • Data structures define the structure of a collection of data types, i.e. primitive data types or objects • The structure provides different ways to access the data • Different tasks need different ways to access the data • Different tasks need different data structures CIS 068 Introduction Typical properties of different structures: • fixed length / variable length • access by index / access by iteration • duplicate elements allowed / not allowed CIS 068 Examples Tasks: • Read 300 integers • Read an unknown number of integers • Read 5th element of sorted collection • Read next element of sorted collection • Merge element at 5th position into collection • Check if object is in collection CIS 068 Examples Although you can invent any datastructure you want, there are ‘classic structures‘, providing: • Coverage of most (classic) problems • Analysis of efficience • Basic implementation in modern languages, like JAVA CIS 068 Data Structures in JAVA Let‘s see what JAVA has to offer: CIS 068 The Collection Hierarchy Collection: top interface, specifying requirements for all collections CIS 068 Collection Interface CIS 068 Collection Interface ! CIS 068 Iterator Interface Purpose: • Sequential access to collection elements • Note: the so far used technique of sequentially accessing elements by sequentially indexing is not reasonable in general (why ?) ! Methods: CIS 068 Iterator Interface Iterator points ‘between‘ the elements of collection: 1 2 3 4 5 first position, Returned element hasNext() = true, remove() throws error Current position (after 2 calls to next() ), remove() deletes element 2 Position after next() hasNext() = false CIS 068 Iterator Interface Usage Typical usage of iterator: CIS 068 Back to Collections AbstractCollection CIS 068 AbstractCollection • Facilitates implementation of Collection interface • Providing a skeletal implementation • Implementation of a concrete class: • Provide data structure (e.g. array) • Provide access to data structure CIS 068 AbstractCollection • Concrete class must provide implementation of Iterator • To maintain ‘abstract character‘ of data in AbstractClass implemented (non abstract) methods use Iterator-methods to access data AbstractCollection add(){ Iterator i=iterator(); … } Clear(){ Iterator i=iterator(); … } myCollection implements Iterator; int[ ] data; Iterator iterator(){ return this; } hasNext(){ … } … CIS 068 Back to Collections List Interface CIS 068 List Interface • Extends the Collection Interface • Adds methods to insert and retrieve objects by their position (index) • Note: Collection Interface could NOT specify the position • A new Iterator, the ListIterator, is introduced • ListIterator extends Iterator, allowing for bidirectional traversal (previousIndex()...) CIS 068 List Interface Incorporates index ! A new Iterator Type (can move forward and backward) CIS 068 Example: Selection-Sorting a List Part 1: call to selection sort Actual implementation of List does not matter ! Call to SelectionSort Use only Iteratorproperties of ListIterator (upcasting) CIS 068 Example: Selection-Sorting a List Part 2: Selection sort access at index ‘fill‘ Inner loop swap CIS 068 Back to Collections AbstractList: ...again the implementation of some methods... Note: Still ABSTRACT ! CIS 068 Concrete Lists ArrayList and Vector: at last concrete implementations ! CIS 068 ArrayList and Vector Vector: • For compatibility reasons (only) • Use ArrayList ArrayList: • Underlying DataStructure is Array • List-Properties add advantage over Array: • Size can grow and shrink • Elements can be inserted and removed in the middle CIS 068 An Alternative Implementation (1) CIS 068 An Alternative Implementation (2) CIS 068 An Alternative Implementation (3) CIS 068 Collections The underlying array-datastructure has • advantages for index-based access • disadvantages for insertion / removal of middle elements (copy), insertion/removal with O(n) • Alternative: linked lists CIS 068 Linked List Flexible structure, providing • Insertion and removal from any place in O(1), compared to O(n) for array-based list • Sequential access • Random access at O(n), compared to O(1) for array-based list CIS 068 Linked List • List of dynamically allocated nodes • Nodes arranged into a linked structure • Data Structure ‘node‘ must provide • Data itself (example: the bead-body) • A possible link to another node (ex.: the link) Children’s pop-beads as an example for a linked list CIS 068 Linked List New node next Old node next (null) CIS 068 Connecting Nodes creating the nodes connecting CIS 068 Inserting Nodes r p.link = r r.link = q q can be accessed by p.link.link CIS 068 Removing Nodes p q CIS 068 Traversing a List (null) CIS 068 Double Linked Lists Single linked list (null) Double linked list (null) data successor predecessor (null) data successor predecessor data successor predecessor (null) CIS 068 Back to Collections AbstractSequentialList and LinkedList CIS 068 LinkedList An implementation example: See textbook CIS 068 Sets Example task: Examine, collection contains object o Solution using a List: -> O(n) operation ! CIS 068 Sets Comparison to List: • Set is designed to overcome the limitation of O(n) • Contains unique elements • contains() / remove() operate in O(1) or O(log n) • No get() method, no index-access... • ...but iterator can (still) be used to traverse set CIS 068 Back to Collections Interface Set CIS 068 Hashing How can method ‘contain()‘ be implemented to be an O(1) operation ? http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html CIS 068 Hashing How can method ‘contain()‘ be implemented to be an O(1) operation ? Idea: • Retrieving an object of an array can be done in O(1) if the index is known • Determine the index to store and retrieve an object by the object itself ! CIS 068 Hashing Determine the index ... by the object itself: Example: Store Strings “Apu“, “Bob“, “Daria“ as Set. Define function H: String -> integer: • Take first character, A=1, B=2,... Store names in String array at position H(name) CIS 068 Hashing Apu: first character: A H(A) = 1 Bob: first character: B H(B) = 2 Daria: first character: D H(D) = 4 ... Apu Bob (unused) Daria (unused) … CIS 068 Hashing • The Function H(o) is called the HashCode of the object o • Properties of a hashcode function: • If a.equals(b) then H(a) = H(b) • BUT NOT NECESSARILY VICE VERSA: • H(a) = H(b) does NOT guarantee a.equals(b) ! • If H() has ‘sufficient variation‘, then it is most likely, that different objects have different hashcodes CIS 068 Hashing • Additionally an array is needed, that has sufficient space to contain at least all elements. • The hashcode may not address an index outside the array, this can easily be achieved by: • H1(o) = H(o) % n • % = modulo-function, n = array length • The larger the array, the more variates H1() ! Apu Bob (unused) Daria (unused) … CIS 068 Hashing Back to the example: Insert ‘Abe‘ First character: A H(A) = 1 H(Apu) = H(Abe), this is called a Collision Apu Bob (unused) Daria (unused) … CIS 068 Solving Collisions Method 1: Don‘t use array of objects, but arrays of linked lists ! Apu Abe Bob (unused) Daria Array contains (start of) linked lists (unused) ARRAY CIS 068 Solving Collisions Drawback: • Objects must be ‘wrapped‘ in node structure, to provide links, introducing a huge overhead ’Apu’ wrap ’Apu’ link Node CIS 068 Solving Collisions Method 2: • Iteratively apply different hashcodes H0, H1, H2,.. to object o, until collision is solved Apu H0 Bob Apu H1 H2 • As long as the different hashcodes (unused) Daria (unused) are used in the same order, the search is guaranteed to be ARRAY consistent CIS 068 Solving Collisions The easiest hashcode-series Hinc: H(0) = H Hi = Hi-1 + i Apu H0 Apu H1 H2 Bob (unused) Daria http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/hash_tables.html (unused) ARRAY CIS 068 add Example implementation of ‘add(Object o)‘ using Hinc (assume array A has length n, H as given above) determine index = H(o) % n while ( A[index] != null ) if o.equals(A[index]) break; else index = (index +1) % n; end } add element at position a[index] CIS 068 contains Example implementation of ‘contains(Object o)‘ using Hinc (assume array A has length n, H as given above) determine index = H(o) % n found = false; while ( A[index] != null ) if o.equals(A[index]) found = true; break; else index = (index +1) % n; end } // ‘found‘ is true if set contains object o CIS 068 Analysis • If there is no collision, contains() operates in O(1) • If the set contains elements having the same hashcode, there is a collision. Being dupmax the maximum value of elements having the same hash code, contains() operates in O(dupmax) • If dupmax is near n, there is no increase in speed, since contains() operates in O(n) CIS 068 A Real Hashcode • JAVA provides a hashcode for every object Method hashCode in java.lang.Object • The implementation for hashCode for e.g. String is computed by: S[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] n = length of string, s[i] = character at position i CIS 068 Rehashing a table What happens if the array is full ? • Create new array, e.g. double size, and insert all elements of old table into new table • Note: the elements won‘t keep their index, since the modulo-function applied to the hashing has changed ! CIS 068 Hashcode Resume • Hashtable provides Set-operations add(), contains() in O(1) if hashcode is chosen properly and array allows for sufficient variation • Speed is gained by usage of more memory • If multiple collisions occur, hashtable might be slower than list due to overhead (computation of H,...) CIS 068