Download Data Structures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Control table wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Hash table wikipedia , lookup

Bloom filter wikipedia , lookup

Rainbow table wikipedia , lookup

Quadtree wikipedia , lookup

Interval tree wikipedia , lookup

Comparison of programming languages (associative array) wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Linked list wikipedia , lookup

Array data structure wikipedia , lookup

Transcript
Data Structures Used by Collections
How data is stored affects how efficiently it is accessed and modified. Where many items of data are
related to each other it is usually most efficient to store them together in some form of structure.
Different data structures have different characteristics; which is best depends on how the data will be
used. Different Collection classes use different underlying data structures (or combinations of data
structures). Note that collections only work with objects, not with primitives. Hence data structures are
described here only in terms of how they store objects. Each data structure will be illustrated by an
example of storing six "Person" objects: Fred, Dave, Bob, Al, Tom, and Mike.
Arrays
Arrays are the simplest data structure and are themselves objects. Multiple object references, all to the
same type (or a subtype) are stored in consecutive blocks of memory in the array object. Element zero
of the array is stored starting at the start address of the array. Subsequent elements follow on from the
end of the preceding element. Thus each element starts at an offset from the array start address
calculated by multiplying together the element index by the number of bytes needed to store each
element.
Pros: It is very quick (and will always take the same amount of time) to randomly access any element
by position in the array. It is also very quick to iterate through an array.
Cons: Once an array has been set up it is not possible to change its size. Sorting an array involves
much copying of its contents from one memory location to another, hence is slow. Similarly
inserting an element involves moving other elements to make way for it and so is also slow.
Accessing an element by content, unless the array is pre-sorted, could involve checking every
element, hence is very slow.
Head First Java 2e (Sierra & Bates) p59-60 uses a good analogy of a tray of cups to describe an array.
Linked Lists
A linked list consists a chain of linked nodes. Each node contains a reference to a data object and a
reference to the next node in the list (a doubly linked list will also have a reference to the previous
node in the list). The last node will have either a null reference (there is no next node) or a reference
back to the first node (a circular linked list).
Pros: Information need not be stored in consecutive blocks of memory so there is no restriction on
changing the size of a linked list. Inserting and removing elements can be done quickly by
rearranging the cell references.
Cons: Accessing elements by position is slow because you have to follow the reference chain from
the start of the list. Similarly sorting is slow. Accessing elements by content could involve
checking every element, hence is very slow.
Hash Tables
Hash tables are arrays that store elements in a position based on their content, hence build a form of
Content Addressable Memory (CAM). All objects have a hashcode (the int value returned by their
hashCode() method). A reference to the object is stored at the array index that is the object's hashcode.
However hashcodes are not unique, two different objects could have the same hashcode, hence each
array element needs to be able to store more than one object reference. Hence practical hashtables are
implemented with a reference to a linked list as each array element (known as chained overflow).
Pros: Insertion, removal, and access by content are all fast.
Cons: Accessing elements by position, and duplicate elements are not possible, iteration order will
effectively be unpredictable (depends on hashcodes).
Binary Trees
Binary trees store their elements in sorted order (in a similar way to linked lists) but are reasonably fast
for all operations. A binary tree has a number of nodes. Each node will store a reference to an object
and also two references to subtrees. References to subtrees may be empty. Each tree has a special node
called the root node which is where the tree starts. Trees are ordered in that all nodes in the left subtree
will come before the current node, and all nodes in the right subtree will come after the current node.
Pros: Reasonable performance for inserting and removing elements and accessing by content. Can be
iterated over reasonably quickly.
Cons: Not the optimum solution in many cases.
Interface Collection
Class
Data Structure
List
Uses an array but is growable. ArrayList is used in many
situations where an array would have been in the past.
Very similar to ArrayList but with synchronized methods,
hence slower. Not much used now.
Uses a doubly linked list. Faster than ArrayList for
inserting and removing elements anywhere other than at end
of list. Much slower than ArrayList for access by position
and iterating over.
Uses a hash table. Requires that stored objects override
hashCode() and equals().
Uses a doubly linked list as well as a hash table. This allows
elements to be iterated over in insertion order rather than
unpredictable order of HashSet. Iteration is also faster as
unused elements in the hash table array are not accessed.
However overhead of maintaining linked list make basic set
operation slower than HashSet. Requires that stored objects
override hashCode() and equals().
Uses a binary tree hence elements are in sorted order and can
be iterated over. Not as fast as HashSet for basic set
operation. Requires that stored objects either implement
Comparable or have a Comparator.
Uses a data structure similar to a binary tree hence elements
are in sorted order. Differs from TreeSet in allowing
duplicates. Requires that stored objects either implement
Comparable or have a Comparator.
Like HashSet uses a hash table but each node has two stored
objects. The first object is the "key" and it is this objects
hashcode that determines the position in the hash table array.
The second object is the "value" that is associated with the
key. Duplicate keys are not allowed but duplicate values are.
The key objects must override hashCode() and equals().
Very similar to HashMap but with synchronized methods,
hence slower. Not much used now.
Uses a doubly linked list as well as a hash table. This allows
elements to be iterated over in insertion order rather than
unpredictable order of HashMap. Iteration is also faster as
unused elements in the hash table array are not accessed.
However overhead of maintaining linked list make basic
map operation slower than HashMap. Requires that stored
objects override hashCode() and equals().
Uses a binary tree where each node has two stored objects,
the "key" and the "value". Elements are in sorted order of the
key and can be iterated over. Not as fast as HashMap for basic
map operation. The key objects must either implement
Comparable or have a Comparator.
ArrayList
Vector
LinkedList
Set
HashSet
LinkedHashSet
TreeSet
Queue
PriorityQueue
Map
HashMap
Hashtable
LinkedHashMap
TreeMap