Download Appendix A ABSTRACT DATA TYPES IN JAVA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Appendix A
ABSTRACT DATA TYPES IN JAVA
ab-stract: That which comprises or concentrates in itself the essential qualities of a larger thing or of several things.
– The New Merriam-Webster Pocket Dictionary
In practice, the larger the program, the harder the debugging and the more limited the
confidence in its correctness. One of the basic programming rules is that no method
should ever exceed a page. The years of experience have shown that the best way is to
split the program into small coherent and understandable pieces, or modules, and then fit
them together. Generally speaking, a module is a unit in a larger software system that
bundles together some data and some operations and has a carefully defined interface.
The external users of the module can make use of the operations and data provided in
the module interface, but the internal implementation of the module is concealed and is
made inaccessible to external users. Thus the hidden internal data representation can be
completely replaced without affecting the external users. This is called substitutability of
data representations, and it permits us to improve the efficiency of a software system by
replacing less efficient data representations with more efficient ones.
Most of the algorithms in this handout are too small to involve this technique. Nonetheless, it can be used to separate data structures from algorithms. Consider, for example,
any above-mentioned algorithm for sorting a set of numbers. It is clear that the algorithm
will work, even without specifying what data structure is used to represent the set. This
separation of data structure from algorithm permits us to study each in isolation as well as
to organize and simplify them. This concept is called data abstraction. An abstraction of
a thing has two qualities: it suppresses irrelevant details and it seeks to isolate the essence
of the thing being abstracted.
221
222
COMPSCI.220FT
A.1 Abstract Data Types
Abstract data types(ADT) are a way of organizing the objects and operations that define a data type in such a way that the specifications and behaviours of the data type are
rigorously separated from the data type’s implementation. Java provides strong, generalpurpose support for modular programming through its classes, interfaces, and packages.
A Java class is an example of an ADT.
ADT’s externally accessible operations and data are given by public methods and
data fields declared in the class.
The ADT’s hidden implementation details are represented by the private data
fields and methods of the class.
Java interfaces can be used to define ADTs that can incorporate general purpose replaceable data components. The interface defines abstract behavioural
characteristics that allowable components must implement. Any specific kinds
of data components that implement the interface can then be used as plugcompatible components, suitable for plugging-in to the ADT.
Java packages are named collections of related classes and interfaces that can be
separately compiled for use in Java applets or applications. Typically, packages,
such as the Java util (utilities) package or the Java awt (abstract window tools)
package, bundle collections of useful software components into a software component library suitable for use by others in building their own Java programs.
By using Java classes to implement ADTs, the hidden implementation details can be
modified (or can even be totally replaced) within the local boundaries of the Java class
definitions without having to make a single external change elsewhere in the program.
Ease of modification is thus convincingly achieved.
Although the terms data type (or just type), data structure, and ADT sound alike, they
have different meanings. In a programming language, the data type of a variable is the
set of values that the variable may assume (e.g., a variable of type boolean can assume
either the value true or the value false, but no other value).
An ADT is a mathematical model, together with various operations defined on the model.
We shall design algorithms in terms of ADT’s, but to implement an algorithm in a particular programming language we must find some way of representing the ADT’s in terms
of data types and operators supported by the programming language itself. To represent
the mathematical model underlying an ADT we use data structures, which are collections
of variables, possibly of several different data types, connected in various ways.
COMPSCI.220FT
223
ADT in mathematics
Here are some standard examples of mathematical entities, not tied to any particular representation.
A set is a collection of zero or more entries. An entry may not appear more than
once. A set of n entries may be denoted fa1 ; a2 ; : : : ; an g, but the position of an
entry has no significance; f0; 3; 6; 7; 8g and f3; 0; 8; 7; 6g represent just the same
set.
A map is a special kind of set, namely, a set of pairs, each pair representing a
one-dimensional mapping from one element to another. For example, a dictionary
(words mapped to meanings) or the conversion from base 2 to base 10 are maps.
A multiset is a set in which repeated elements are allowed; e.g., f1; 3; 1; 5; 4; 0g is
a multiset. Multisets are generally easier to deal with than sets, since checking for
duplicates is expensive.
A sequence is an ordered collection of zero or more entries, denoted [a1 ; a2 ; : : :, an ].
The position of an entry in a sequence is significant; for example, the fifth entry, or
the successor of a given entry, may be referred to.
A graph G = [V; E ] is a set V of vertices (nodes) and a set E of edges (arcs, links),
i.e. two-element subsets of V . This definition excludes self-loops (edges from a
vertex to itself) and parallel edges (two edges connecting the same two vertices).
An example is given in Figure A.1.
V = { a, b, c, d }
E = { { a, d }, { d, c }, { a, c } }
b
a
c
d
Figure A.1: Graph G = [V; E ] with the four vertices and three edges.
This list can be easily extended to include, for example, directed graphs (digraphs), complex numbers, matrices, etc. When one of these entities is used in an algorithm, certain
operations are performed on it (say, insertions and membership tests on a set). This observation leads naturally to the concept of an ADT: a mathematical entity together with
some operations defined on it.
224
COMPSCI.220FT
The ADT concept
The ADT concept has been implicitly used since the beginning of modern computing
history. For example, the mathematical entity integer with the operations addition, subtraction, negation, multiplication, division, and the comparisons, has always been at the
heart of computing machinery, and it provides a good illustration of the advantages to be
gained from specifying ADTs.
1. Users of integer never need concern themselves with the implementation of the
operations: they know what the operations do, and can use them effectively without
ever knowing what (micro)electronic circuitry is being employed.
2. The implementor of integer, in this case a hardware designer, is free to experiment
with different implementations. All that matters is that the right result be returned.
By providing a clean interface between use and implementation, the ADT separates the
two and clarifies the task of both. The integer ADT model consists of a finite range of
integers (either machine or language dependent) together with those operations supported
by the language that can be performed directly on integers. As a programmer, you do not
need to understand how the integers will be represented internally (binary, two’s complement form, etc.) or how the specific operations have been implemented by the language
processor in order to use them in writing a program.
When a class of data objects or data structures belongs to an ADT, it should not be necessary for a programmer to know the internal representation of the ADT data objects in
order to use or manipulate them within the rules for the class; this property is known as
information hiding. Information hiding is important for several reasons.
1. By hiding the internal structure of a data object, the user is able to work at a higher
programming level and is less likely to inadvertantly misuse data objects of the
class (a prime source of bugs is thus avoided).
2. Less sophisticated users are able to use data objects effectively without having to
understand the complexities of the internal structure of the data object or how the
operations for the ADT class were implemented.
The ADT concept is useful in the study of data structures. As each new data structure is
introduced, the set of procedures developed to create and manipulate the structure can be
viewed as operations to be performed on the class of data objects defined by the data structure. The ADT concept implies that the manipulation of the ADT data objects is restricted
to only those operations that are part of the ADT definition. This restriction simplifies program development and significantly reduces a major source of program bugs.
We think of an ADT as a mathematical data model with a collection of operations defined
on that model. Sets of integers, together with the operations of union, intersection and
COMPSCI.220FT
225
set difference, form a simple example of an ADT. In an ADT, the operations can take as
operands not only instances of the ADT being defined but other types of operands, say,
integers or instances of another ADT, and the result of an operation can be other than an
instance of that ADT. However, at least one operand, or the result, of any operation is of
the ADT in question.
ADTs are generalizations of primitive data types (integer, real, and so on), just as procedures or methods are generalizations of primitive operations(+, ;, and so on). The ADT
encapsulates a data type in the sense that the definition of the type and all operations on
that type are localized to one section of the program (therefore, each Java class can be
treated as an ADT). If you wish to change the implementation of an ADT, you know
where to look, and by revising one small section you can be sure that there is no subtlety
elsewhere in the program that will cause errors concerning this data type. Outside the
section in which the ADT’s operations are defined, you can treat the ADT as a primitive
type; you have no concern with the underlying implementation.
Simple example – the Palindrome class. Using the ADT principles you can write programs that use a data type without knowing any more about its implementation. In fact,
that is a sign of a properly specified interface. If you need to peek into the code that implements a data type in order to use the type, then the type is not properly specified.
Let us use a sample program called Palindrome to illustrate these principles with a bit
more complex data structure than integers. A palindrome is a character string that is the
same when read forward or backward, for instance, “anna”, “bob”, and “+*=*+”. Since
computer character sets differentiate between the lower- and uppercase letters, “Anna”
is not a palindrome. Our program will read a character string from the terminal, determine whether it is a palindrome, and print an appropriate message. The java.lang
package contains the desired string data structures, namely, the classes String and
StringBuffer. The instance method equals() of the first class permits you to
check whether two strings are the same and the method reverse() of the second class
is able to reverse a string. The data types String and StringBuffer are used below
without knowing any of their details. All we knew were their specifications in the package
java.lang.
1
2
3
4
5
6
7
8
9
import java.io.*;
// methods for reading input data
public class Palindrome {
public static void main( String args[] ) {
String s = new String( "" );
BufferedReader in = new BufferedReader(
new InputStreamReader( System.in ) );
226
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
COMPSCI.220FT
System.out.println("Enter one string per line");
System.out.println("(an empty line stops the program)");
while( true ) {
try{
s = in.readLine();
} catch( IOException e ) {
s = "";
} finally {
if ( s.length() > 0 ) {
StringBuffer fsb = new StringBuffer( s );
StringBuffer rsb = fsb.reverse();
if ( s.equals( rsb.toString() ) )
System.out.println("is a palindrome !!!");
else
System.out.println("is not a palindrome ...");
} else {
System.out.println( "-------END-----" );
System.exit(0);
}
}
}
}
}
Lines 1, 8–12, and 14–17 permit you to read a text line from your terminal. You must use
the try-catch()-finally construction because the BufferedReader’s method
readLine() may throw an IOException. Lines 13 and 31 produce the infinite
loop for entering and checking text lines, and lines 19 and 26–29 terminate the program when an empty line is entered. Lines 20 and 21 reverse a given text by using
the StringBuffer data type and the available instance method reverese(). To
compare the input string to the reversed one, using the method equals() of the class
String, the method toString() converts the StringBuffer data to the String
data. Of course, you may write another version of the program, giving similar results like
Enter one string per line
(an empty line stops the program)
Anna
It is not a palindrome ...
ANNA
It is a palindrome !!!
westtsew
It is a palindrome !!!
COMPSCI.220FT
227
%%%%%%%
It is a palindrome !!!
l
It is a palindrome !!!
-------END----But in either case, you need not know how the classes String and StringBuffer
are implemented in Java. We have an interface that includes everything the user needs to
know in order to use the data types, and everything an implementor needs to know in order
to write code to implement the type. It is a contract between the user and implementor.
A.2 ADTs and Java classes
Abstract data types are collections of objects and operations that present well-defined
abstract properties to their users, while hiding the way they are represented in terms of
lower-level data representations. It should be emphasized that there is no limit to the
number of operations that can be applied to instances of a given mathematical model, or
object. Each set of operations defines a distinct ADT.
Java classes, interfaces, and packages are used to express the concepts of modularity, information hiding, and data abstraction. The access modifiers (public, private, and
so on) for data fields and methods in Java class definitions permit us to define whether
these data fields and methods are public (and thus visible to and available for use by
outside users of the class) or are private (and thus invisible to and unavailable for use
by outside users).
Objects and operations at higher level of data abstraction are represented by organizing
objects and operations at lower levels. The ADTs at the highest level of data abstraction
– structures such as sets, trees, lists, stacks, and queues – can be represented in a variety
of different ways by lower-level representations, including those in the two broad classes
- sequential representations and linked representations1. At still lower levels, the linked
representations can be represented in a variety of different ways, such as using the parallel
arrays or Java objects that contain references to other Java objects in their data fields (see
Figure A.2).
For security reasons, Java does not provide explicit pointer values (in contrast to other
OOP languages such as C++ or Object Pascal). However, Java uses implicit pointers to
1
Sequential data representation uses a sequence of individual blocks of storage, each block is independently accessed by its unique address in that sequence. Linked data representations are created by linking
individual blocks of storage together using pointers.
228
COMPSCI.220FT
Lists
Queues
Stacks
Sequential
Representations
Arrays
Strings
Sets
Trees
ADTs
Linked
Representations
Implicit Pointer
Representations
Parallel
Arrays
Figure A.2: Levels of data abstraction.
access arrays and objects. In fact, Java divides its data values into two classes: primitive
data values (such as integers, characters, boolean truth values, and floating point numbers)
and reference values that are references to objects and arrays. Such Java reference values
are just pointers to objects and arrays, even though Java does not provide any pointer following operations. Java reference values can be stored in data fields of objects, as items
in arrays, or as values of variables. This provides a satisfactory basis for implementing
linked data representations.
The Collections Framework, first introduced with the Java 1.2, provides a well-designed
set of interfaces and classes for storing and manipulating groups of data as a single unit, a
collection2 . In Java, a collection is a group of related data elements, organised into a single
object, with operations provided to manipulate the data. Java technology has always
offered support for collections, in particular, via the Vector, Stack, Hashtable, and
Properties classes. But the new framework for collections in Java 1.2 has significant
advantages over the old classes. These latter can be used in Java 1.2, too, as well as they
are available for use with Java 1.1 runtime environments. The advantages of the Java 1.2
Collections Framework include:
2
Reduced programming effort.
Support for software reuse, in that data structures conforming to the collection interfaces are reusable in a wide variety of contexts and applications.
Easier to design with Application Programming Interfaces (APIs).
Easier to pass collections between unrelated APIs.
Increased program speed.
In common usage a collection is the same as the intuitive, mathematical concept of a set, so that both the
terms might be considered as synonyms. But in mathematics each set entry can appear only once whereas
collections have generally no such restriction.
COMPSCI.220FT
229
Collection
List
Set
Map
SortedMap
SortedSet
AbstractCollection
AbstractMap
AbstractSet AbstractList
Abstract classes
I
n
t
e
r
f
a
c
e
AbstractSe−
quentialList
HashSet
ArrayList
HashMap
TreeSet
LinkedList
TreeMap
I
m
p
l
e
m
e
n
t
a
t
i
o
n
Concrete classes
Figure A.3: The Collections Framework hierarchy in Java 1.2. Solid lines show relationships between the interfaces and the concrete classes that implement the interfaces and
dot-dashed lines show relationships between the abstract classes and the concrete classes
extending the abstract ones.
230
COMPSCI.220FT
List
AbstractList
Map
Dictionary
Stack
Vector
Hashtable
Properties
Figure A.4: Historical classes in the Collections Framework.
Increased program quality (fewer bugs, easier maintenance).
The Collections Framework provides a convenient API to many of the ADTs such as
maps, sets, lists, trees, arrays, hashtables and other collections (see Figure A.3). Because
of their object-oriented design, the Java classes in the Collections Framework encapsulate
both the data structures and the algorithms associated with these abstractions. This offers a standard programming interface to many of the most common abstractions, without
burdening the programmer with too many procedures and interfaces. The operations supported by the collections framework nevertheless permit the programmer to easily define
higher level ADTs, such as stacks, queues, and so forth3 .
In such a hierarchy, the root Collection interface defines a group of objects, known as
its elements. Some Collection implementations allow duplicate elements and others
do not. A Set extends Collection but forbids duplicates. As you might expect, this
interface models the mathematical set ADT. A List extends Collection also, allows
duplicates and introduces positional indexing so that it is an ordered collection. The user
of a List generally has precise control over where in the List each element is inserted.
A Map extends neither Set nor Collection and maps keys to values. Maps cannot
have duplicate keys: each key can map to at most one value.
The last two more collection interfaces (SortedSet and SortedMap) are merely
sorted versions of Set and Map. In Java, there are two ways to order objects: the
3
One might think that Map would extend Collection. In mathematics, a map is just a collection
of pairs. In the Collections Framework, however, the interfaces Map and Collection are distinct and
have no link in the hierarchy. The reasons for this distinction are that the typical application of a Map is
to provide access to values stored by keys. The set of collection operations are all there, but to work with
a key–value pair, instead of an isolated element, Map supports the basic operations of get() and put()
keys or values which are not required by Set. Moreover, there are methods that return Set views of Map
objects, for example, Set set = aMap.keySet().
COMPSCI.220FT
231
Comparable interface provides automatic natural order on classes that implement it,
while the Comparator interface gives the programmer complete control over object ordering. Note that these are NOT core collection interfaces, but underlying infrastructure.
A SortedSet is a Set that maintains its elements in ascending order and provides several additional operations to take advantage of the ordering. A SortedMap is a Map that
maintains its mappings in ascending key order (it is the Map analogue of SortedSet).
The SortedSet interface permits you to build word lists and membership rolls, whereas
the SortedMap interface can be used for creating dictionaries and telephone directories.
The following table shows the six collection implementations introduced with the Java
1.2 framework, in addition to the four historical collection classes (the framework is designed in such a way that the new and historical classess can interoperate). The historical
collection classes are called such because they have been used since the Java 1.0 release.
Figure A.4 shows how these historical classes have been integrated into the Collections
Framework.
Interface
Set
Hash table
HashSet
List
Map
Implementation
Resizable array Balanced tree
TreeSet
ArrayList
HashMap
TreeMap
Linked list
Historical
classes
LinkedList
Vector
Stack
Hashtable
Properties
Java 1.2 provides two implementations of each interface, except for Collection which
has no direct implementations, but serves as a least common denominator for the other
collection interfaces. In each case, the primary implementations to be used, all other
things being equal, are HashTable, ArrayList, and HashMap, respectively. Note
that the SortedSet and SortedMap have no rows in the table above. Each of these
interfaces has one implementation and these implementations (TreeSet and TreeMap)
are listed in the Set and Map rows.
The Collections Framework is made up of the above set of interfaces for working with
groups of objects (the different interfaces describe the different types of groups). For
the most part, once you understand the interfaces, you understand the framework. While
you always need to create specific implementations of the interfaces, access to the actual
collection should be restricted to the use of the interface methods, thus allowing you to
change the underlying data structure, without altering the rest of your code.
All the above-mentioned Java 1.2 general-purpose implementations have consistent behaviour. They permit null elements, keys, and values, and have fail-fast iterators, which
232
COMPSCI.220FT
detect illegal concurrent modification of a collection during iteration and fail quickly and
cleanly, rather than risking arbitrary behaviour in the future. Note that the new Java 1.2
implementations are now unsynchronized, in contrast to the previous classes Vector
and Hashtable, which were introduced in Java 1.0. This was taken because the collections are frequently used in a manner where the synchronization is of no benefit (for
instance, single-threaded use, or read-only use, or use as a part of a larger data object
that does its own synchronization). If one needs a synchronized collection, there exist the
synchronization wrappers that allow any collection to be transformed into a synchronized
one.