Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Control table wikipedia , lookup
Lattice model (finance) wikipedia , lookup
Bloom filter wikipedia , lookup
Rainbow table wikipedia , lookup
Interval tree wikipedia , lookup
Comparison of programming languages (associative array) wikipedia , lookup
Binary search tree wikipedia , lookup
Data Structures Used by Collections How data is stored affects how efficiently it is accessed and modified. Where many items of data are related to each other it is usually most efficient to store them together in some form of structure. Different data structures have different characteristics; which is best depends on how the data will be used. Different Collection classes use different underlying data structures (or combinations of data structures). Note that collections only work with objects, not with primitives. Hence data structures are described here only in terms of how they store objects. Each data structure will be illustrated by an example of storing six "Person" objects: Fred, Dave, Bob, Al, Tom, and Mike. Arrays Arrays are the simplest data structure and are themselves objects. Multiple object references, all to the same type (or a subtype) are stored in consecutive blocks of memory in the array object. Element zero of the array is stored starting at the start address of the array. Subsequent elements follow on from the end of the preceding element. Thus each element starts at an offset from the array start address calculated by multiplying together the element index by the number of bytes needed to store each element. Pros: It is very quick (and will always take the same amount of time) to randomly access any element by position in the array. It is also very quick to iterate through an array. Cons: Once an array has been set up it is not possible to change its size. Sorting an array involves much copying of its contents from one memory location to another, hence is slow. Similarly inserting an element involves moving other elements to make way for it and so is also slow. Accessing an element by content, unless the array is pre-sorted, could involve checking every element, hence is very slow. Head First Java 2e (Sierra & Bates) p59-60 uses a good analogy of a tray of cups to describe an array. Linked Lists A linked list consists a chain of linked nodes. Each node contains a reference to a data object and a reference to the next node in the list (a doubly linked list will also have a reference to the previous node in the list). The last node will have either a null reference (there is no next node) or a reference back to the first node (a circular linked list). Pros: Information need not be stored in consecutive blocks of memory so there is no restriction on changing the size of a linked list. Inserting and removing elements can be done quickly by rearranging the cell references. Cons: Accessing elements by position is slow because you have to follow the reference chain from the start of the list. Similarly sorting is slow. Accessing elements by content could involve checking every element, hence is very slow. Hash Tables Hash tables are arrays that store elements in a position based on their content, hence build a form of Content Addressable Memory (CAM). All objects have a hashcode (the int value returned by their hashCode() method). A reference to the object is stored at the array index that is the object's hashcode. However hashcodes are not unique, two different objects could have the same hashcode, hence each array element needs to be able to store more than one object reference. Hence practical hashtables are implemented with a reference to a linked list as each array element (known as chained overflow). Pros: Insertion, removal, and access by content are all fast. Cons: Accessing elements by position, and duplicate elements are not possible, iteration order will effectively be unpredictable (depends on hashcodes). Binary Trees Binary trees store their elements in sorted order (in a similar way to linked lists) but are reasonably fast for all operations. A binary tree has a number of nodes. Each node will store a reference to an object and also two references to subtrees. References to subtrees may be empty. Each tree has a special node called the root node which is where the tree starts. Trees are ordered in that all nodes in the left subtree will come before the current node, and all nodes in the right subtree will come after the current node. Pros: Reasonable performance for inserting and removing elements and accessing by content. Can be iterated over reasonably quickly. Cons: Not the optimum solution in many cases. Interface Collection Class Data Structure List Uses an array but is growable. ArrayList is used in many situations where an array would have been in the past. Very similar to ArrayList but with synchronized methods, hence slower. Not much used now. Uses a doubly linked list. Faster than ArrayList for inserting and removing elements anywhere other than at end of list. Much slower than ArrayList for access by position and iterating over. Uses a hash table. Requires that stored objects override hashCode() and equals(). Uses a doubly linked list as well as a hash table. This allows elements to be iterated over in insertion order rather than unpredictable order of HashSet. Iteration is also faster as unused elements in the hash table array are not accessed. However overhead of maintaining linked list make basic set operation slower than HashSet. Requires that stored objects override hashCode() and equals(). Uses a binary tree hence elements are in sorted order and can be iterated over. Not as fast as HashSet for basic set operation. Requires that stored objects either implement Comparable or have a Comparator. Uses a data structure similar to a binary tree hence elements are in sorted order. Differs from TreeSet in allowing duplicates. Requires that stored objects either implement Comparable or have a Comparator. Like HashSet uses a hash table but each node has two stored objects. The first object is the "key" and it is this objects hashcode that determines the position in the hash table array. The second object is the "value" that is associated with the key. Duplicate keys are not allowed but duplicate values are. The key objects must override hashCode() and equals(). Very similar to HashMap but with synchronized methods, hence slower. Not much used now. Uses a doubly linked list as well as a hash table. This allows elements to be iterated over in insertion order rather than unpredictable order of HashMap. Iteration is also faster as unused elements in the hash table array are not accessed. However overhead of maintaining linked list make basic map operation slower than HashMap. Requires that stored objects override hashCode() and equals(). Uses a binary tree where each node has two stored objects, the "key" and the "value". Elements are in sorted order of the key and can be iterated over. Not as fast as HashMap for basic map operation. The key objects must either implement Comparable or have a Comparator. ArrayList Vector LinkedList Set HashSet LinkedHashSet TreeSet Queue PriorityQueue Map HashMap Hashtable LinkedHashMap TreeMap