Download word - Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CMPS 101
Algorithms and Abstract Data Types
ADTs and Modules in Java and ANSI C
Introduction
This document introduces the concepts of Modules and ADTs, and describes how to implement them
in both Java and ANSI C. Informally, an Abstract Data Type (ADT) is a collection of mathematical
objects of some kind, together with some associated operations on those objects. When an ADT is
used in a program, it is usually implemented in its own module. A module is a self-contained
component of a program having a well defined interface that details its role and relationship to the rest
of the program.
Why are ADTs necessary? The standard data types provided by many programming languages are not
powerful enough to capture the way we think about the higher level objects in our programs. This is
why most languages have a type declaration mechanism the allows the user to create high level types
as desired. Often the implementation of these high level types gets spread throughout the program,
creating complexity and confusion. Errors may occur when the legal operations on the high level types
are not well defined or are not consistently used.
The term Abstract Data Type can mean different things to different people, but for the purposes of this
course, an ADT consists of two things:
(1) A set S of “mathematical structures”, the elements of which are called states.
(2) An associated set of operations which can be applied to the states in S.
Each ADT instance or object has a state which is one of the mathematical structures in the set S. The
operations on S fall (roughly) into two classes. Manipulation procedures are operations which cause
an ADT instance to change it's state. Access functions are operations which return information about
an ADT instance, without altering it's state. In this course we will maintain a clear distinction between
the two types of operations. We will also from time to time consider operations which don't fall into
either category, but we will not use operations which belong to both categories.
An ADT is an abstract mathematical entity which exists apart from any program or computing device.
On the other hand, ADTs are frequently implemented by a program module. We will distinguish
between the mathematical ADT and it's implementation in a programming language. In fact, a single
ADT could have many different implementations, all with various advantages and disadvantages.
Example Consider an integer queue. In this case S is the set of all finite sequences of integers, and
the associated operations are: Enqueue, Dequeue, getFront, getLength, and isEmpty. The meanings of
these operations are given below. One possible state for this ADT is (5, 1, -7, 2, -3, 4, 2). (It is
recommended that the reader who is unfamiliar with elementary data structures such as queues, stacks,
and lists, review sections 10.1 and 10.2 of the text.)
1
Manipulation procedures
Enqueue
Dequeue
Insert a new integer at the back of the queue
Remove an integer from the front of the queue
Access functions
getFront
getLength
isEmpty
Return the integer at the front of the queue
Return the number of integers in the queue
Return true if length is zero, false otherwise
Other examples of mathematical structures which could form the basis for an ADT are: sets, graphs,
trees, matrices, polynomials, or finite sequences of such structures. In principle, the underlying set S
could be anything, but typically it is a set of discrete mathematical objects of some kind.
An ADT instance (or object) is always associated with a particular sequence, or history, of states,
brought about by the application of ADT operations. In our queue example we could have the
following sequence starting with the empty state  :
Operation
Enqueue(5)
Enqueue(1)
Enqueue(7)
Dequeue()
Enqueue(3)
getLength()
………..
State

(5)
(5, 1)
(5, 1, 7)
(1, 7)
(1, 7, 3)
(1, 7, 3)
Observe that if isEmpty is true for some state, then getFront and Dequeue are undefined on that state.
One option to deal with this situation would be to make special definitions for Dequeue and getFront
on an empty queue. We could for instance define getFront to return zero on an empty queue, and
define Dequeue to not change it's state. Unfortunately, these special cases complicate the ADT and
can easily lead to errors. A better solution is to establish preconditions for each operation indicating
exactly when (i.e. to which states) that operation can be applied. Thus a precondition for both getFront
and Dequeue is: ‘not isEmpty’. In order for an ADT to be useful, the user must be able to determine if
the preconditions for each operation are satisfied. Good ADTs clearly indicate the preconditions for
each operation, usually as a sequence of access function calls. Good ADTs also document their
postconditions, i.e. conditions which will be true after an operation is performed. For example, a
postcondition of Enqueue is ‘not isEmpty’. ADT operations can sometimes be thought of as functions
in the mathematical sense. Preconditions and postconditions then define the function's domain and
codomain. Only when all operations have been defined, along with all relevant preconditions and
postconditions, can we say that an ADT has been fully specified.
We often consider multiple instances of the same ADT. For example, we may speak of several
simultaneous integer queues. ADT operations should therefore specify which object is being operated
on. It is also possible for some operations to refer to multiple objects. We could for instance have an
access function called Equals which operates on two queues and returns true if they contain the same
elements in the same order and false otherwise, or a manipulation procedure called Concatenate which
empties one queue and places its contents at the end of the another queue.
2
It is sometimes helpful to think of an ADT object as being a ‘black box’ equipped with a control panel
containing buttons which can be pushed (manipulation procedures), and indicator lights which can be
read (access functions).
Integer queue
getFront
getLength
Enqueue
isEmpty
Dequeue
Note that some texts (including our own) define Dequeue so as to return the front element, as well as to
alter the state of the queue. In this example, the Dequeue operation deletes the element at the front of
the queue, but doesn’t return a value. We adopt this particular definition in order to maintain the
distinction between access functions and manipulation procedures. Note that such a change in the set
of ADT operations results in a different ADT. If for instance we implement our queue by storing
integers in an array of fixed size, then we should add an access function called isFull which reports on
whether there is room left in the array for another integer. Enqueue would then have the precondition
‘not isFull’. Although this ADT and our original ADT can both be legitimately called queues, they are
different ADTs.
Implementing ADTs
There is a straightforward way of implementing an ADT in both Java and ANSI C, once it has been
specified. The implementation strategies in these two languages look different on the surface, but
conceptually they are very similar, the differences being mostly syntactical.
Java
In Java an ADT is embodied in a class. A class contains fields (or member variables) which form the
‘mathematical structure’, and methods which implement the ADT operations. Such a class may also
contain some (private) inner classes as part of the ‘mathematical structure’. An instance of this class is
accessed by a reference variable which represents an instance of the ADT.
Example The inside of our integer queue ‘black box’ can be pictured as
Instance of Queue ADT
Fields
Reference Variable
myQueue
front
back
length
data
next
Private Node Class
3
The user of the ADT should never be allowed to directly access the ‘structure’ inside the ‘box’.
Instead a reference variable (myQueue) points to an instance of the class and is passed as an (implicit)
first argument to the methods of the class. For example, the call
myQueue.getFront()
would return the front element in the ADT object referenced by myQueue. For this reason (instance)
variables should always be declared as private. This is the idea of information hiding: the user of the
ADT cannot see or directly effect anything inside the ‘black box’ except through the official ADT
operations. In the implementation depicted above, a queue consists of a singly linked list of private
Node objects which cannot be directly accessed by the user of the Queue class. The purpose of this
restriction is to free the user from the responsibility of knowing the internal details of a Queue object,
which reduces the complexity of the user’s task. To the user, a Queue is simply a sequence of integers
which can be manipulated in certain ways.
Other methods are also needed but which do not correspond to access functions or manipulation
procedures. Among these are the constructors which create new ADT instances, and the toString
method which provides a string representation of the class. Our integer queue ADT in Java might
look something like:
// Queue.java
// An integer queue ADT
class Queue {
private class Node {
// Fields
int data;
Node next;
// Constructor
Node(int data) {...}
// toString: overrides Object’s toString method
public String toString() {...}
}
// Fields
private Node front;
private Node back;
private int length;
// Constructors
Queue() {...}
// Access functions
int getFront() {...}
int getLength() {...}
boolean isEmpty() {...}
4
// Manipulation procedures
void Enqueue(int data) {...}
void Dequeue() {...}
// other methods
// toString: Overrides Object's toString method.
public String toString() {...}
}
An ADT implementation should be fully tested in isolation before it is used in a larger program. The
following program serves this purpose.
// QueueTest.java
class QueueTest {
public static void main(String[] args){
// Allocate several Queue Objects, and manipulate
// them in various ways. Call each of the above
// ADT operations at least once.
... ... ...
}
}
Exercise Fill in the definitions of all of these methods, i.e. replace {...} where it appears by some
appropriate Java source code. A solution to this exercise will be posted on the webpage.
By convention, all of our ADT implementations in Java should follow this same pattern: private inner
classes, followed by fields, then constructors, access methods, manipulation procedures, then all other
methods. This convention is by no means universal, but we adopt it in this course for the sake of
uniformity. All ADT operations must state their own preconditions in a comment block, and then
check that those preconditions are satisfied before proceeding. If a precondition is violated, the
program should quit with a useful error message. This can be done efficiently in Java by throwing a
RuntimeException. (It is not necessary, and not recommended, that you write new Exception classes
to be thrown when preconditions are violated.)
A mentioned previously, a module is a part of a program which is isolated from the rest of the program
by a well defined interface. We think of modules as providing services (e.g. functions, data types, etc.)
to clients. A client is anything (program, person, another module) which uses a module’s services.
These services are said to be exported to the client or imported from the module.
Interface
Module
Client
The module concept supports the idea of information hiding, i.e. clients have no access to a module’s
implementation details (inside the black box). The client can only access the services exported through
5
the interface. Generally we will have a separate module for each ADT. As we’ve seen, an ADT
module in Java is embodied in a single .java file containing a single top level class, and possibly some
private inner classes. The module interface consists of all variables and methods in that class which
are not declared private.
ANSI C
In ANSI C the situation is somewhat different since modularity is not directly supported by the
language. An ADT implementation contains a struct which provides access to the ‘mathematical
structure’ underlying the ADT. The user of the ADT (i.e. the client module) is given a reference
which is a pointer to this struct. One C function is defined for each of the ADT operations. Each such
function takes an ADT reference as argument, specifying which instance of the ADT to operate on.
This reference is defined in a way that prevents the client from following the pointer to access the
interior of the ‘black box’, thus enforcing the information hiding paradigm.
Instance of Queue ADT
Queue struct
Pointer Variable
myQueue
front
back
length
data
next
Private Node struct
Two more C functions are necessary. One to create new objects (constructor) and one to free memory
associated with ADT instances no longer in use (destructor). It is the responsibility of these functions
to manage all of the memory inside the ‘black box’, balancing calls to malloc (or calloc) and free.
In C, we split our ADT module implementation into a .c file containing struct and function definitions,
and a .h file containing typedefs and prototypes of exported functions. The ADT interface by definition
consists of exactly that which appears in the .h file. Functions whose prototypes do not appear in this
file cannot be accessed from outside the ADT implementation, are therefore considered private.
Example
/* Queue.h */
typedef struct Queue* QueueRef;
/* Constructor-Destructor */
QueueRef newQueue(void);
void freeQueue(QueueRef* pQ);
/* Access functions */
int getFront(QueueRef Q);
int getLength(QueueRef Q);
int isEmpty(QueueRef Q);
/* Manipulation procedures */
void Enqueue(QueueRef Q, int data);
void Dequeue(QueueRef Q);
6
/* Other functions */
void printQueue(QueueRef Q, FILE* out);
In this example, Queue.h defines a pointer called QueueRef to a struct called Queue, which is not
defined in this file. This is how data hiding is implemented in C. The client module will #include
Queue.h so that the compiler recognizes calls to the exported functions. The client can also declare
variables of type QueueRef and define functions which take QueueRef arguments. Notice however
that the client cannot dereference through a QueueRef variable, since the Queue struct is not defined in
the file Queue.h. That definition appears in the next file.
/* Queue.c */
#include<stdio.h>
#include<stdlib.h>
#include "Queue.h"
/* Private inner Node struct, corresponding reference type, and
* constructor-destructor pair. Not exported.
typedef struct Node{
int data;
struct Node* next;
} Node;
typedef Node* NodeRef;
NodeRef newNode(int node_data) {...}
void freeNode(NodeRef* pN) {...}
/* Public Queue struct, constructor-destructor */
typedef struct Queue{
NodeRef front;
NodeRef back;
int length;
} Queue;
QueueRef newQueue(void){
QueueRef Q;
Q = malloc(sizeof(Queue));
Q->front = Q->back = NULL;
Q->length = 0;
return(Q);
}
void freeQueue(QueueRef* pQ) {...}
/* Access functions */
int getFront(QueueRef Q) {...}
int getLength(QueueRef Q) {...}
int isEmpty(QueueRef Q) {...}
/* Manipulation procedures */
void Enqueue(QueueRef Q, int data) {...}
void Dequeue(QueueRef Q) {...}
7
*
*/
/* Other functions */
void printQueue(QueueRef Q, FILE* out) {...}
Notice that the type NodeRef, as well as functions newNode() and freeNode(), do not appear in
the file Queue.h, and are therefore not available to the client. Exporting these items would give the
client access to the inside of the black box, violating the modularity principle. Notice also that another
public function called printQueue() is included in both Queue.h and Queue.c. This function prints
the state of a Queue object to a FILE handle (which may be stdout.) Function printQueue()
corresponds roughly to the toString() function in Java.
As before, it is necessary to test the ADT implementation in isolation before it is used in a larger
application.
/* QueueTest.c */
#include<stdio.h>
#include<stdlib.h>
#include "Queue.h"
int main(int argc, char* argv[]){
/* Call all of the above functions at least once */
return(0);
}
Exercise Complete the definitions of these functions by replacing { . . .} by appropriate C code. The
solution to this exercise will be posted on the webpage. Also read the handout entitled "Some
Additional Remarks on ADTs and Modules in ANSI C".
Some may (correctly) argue that our ANSI C Integer Queue is not really a general purpose queue, and
that we should really write a queue of ‘anythings’. The problem is that C’s type mechanism is not
advanced enough to properly deal with this issue. There are two possible solutions. The safer solution
is to simply edit your Integer Queue to be a queue of whatever you need a queue of. Simply changing
the appropriate ints to the new type will create a ready-made queue. This change can be easily
accomplished by defining the type QueueElement in the .h file as
typedef int QueueElement
The type QueueElement is used to refer to the things that are stored in a Queue. This methodology
lets you change the element type by editing a single line of code. (We follow this in the exercise
solution posted on the webpage.)
This simple fix has the drawback that if you want int queues and double queues in the same
program, then you need two different Queue modules. A more powerful (and dangerous) technique is
to make QueueElement a generic pointer, by doing
typedef void* QueueElement
Now the Queue module can handle Queues which hold any kind of pointer. The danger is that a client
might get confused and call getFront() or Enqueue() on the wrong kind of pointer. Using void*
means that you will not find out about this problem until you run the program and get a segmentation
8
fault. These types of pointer errors can be very difficult to debug. Given these warnings, I would
recommend the safer solution for those students who do not have extensive C experience.
The equivalent queue of ‘anything’ in Java is accomplished by simply defining the data field in the
private node class to be Object rather than int. This is essentially the same as using void* in C, but
without the same danger of runtime errors. If such an error does occur, Java’s exception handling
mechanism should make it easier to track down. Better yet, starting with JDK 1.5, Java offers a new
mechanism for abstracting over data types called generics, which is similar to the notion of a Template
in C++. See http://java.sun.com/j2se/1.5/pdf/generics-tutorial.pdf for a nice tutorial on the subject.
9