Download LISTS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quadtree wikipedia , lookup

Interval tree wikipedia , lookup

Array data structure wikipedia , lookup

B-tree wikipedia , lookup

Binary search tree wikipedia , lookup

Linked list wikipedia , lookup

Transcript
LISTS
Data structures are defined by the processes which are required to produce
and maintain them. This makes them prime candidates for objects implemented
with object oriented languages. In their ideal form, the processes work
with generic data or with data that is specified at the time that the data
structure is produced and used. Thus it should not matter whether the data
is a single number or a collection of large records with many kinds of data.
The combination of specifying the structure by the processes and the data
That are used by the processes makes an ABSTRACT DATA TYPE (ADT).
The following 5 operations define an ADT known as an ordered list.
1. Create - an empty ordered list
2. Length - returns the number of items currently in the list
3. Insert - insert an item into the list
4. Delete - delete an item from a list
5. Retrieve - return the value of the specified item in the list
Notice that the algorithms required to define the list are not concerned
with how the list is implemented. It may be done with an array or
dynamically allocated as each part is required ( linked-list). Some of the
operations are concerned with the maintenance of the ADT and others with the
accessing of information but both are required for the definition. For
example, if the language used already understands the relationships among
the digits in order to produce integers, the task of implementation is
simplified and the relational operators already allow us to
determine whether one key is larger than another. If not, then the
processes which are required to build the relationships are built into the
code. For example, finding operators which will determine the relationships
among enumerated types used as keys in a data structure. They do not change
the specification of the processes which are necessary to make the list.
Lists are a collection of items in which each item has a specific
position. The specification for positioning the items provides some rules of
order so this data structure is called an ordered list. Some common
ordered lists are:
* chronologically ordered- in which items are inserted in the order in
which they arrive for insertion
* frequency ordered- in which the items used most often are closest to
the beginning of the list
* sorted- in which the items are placed in ascending or descending
order.
By choosing a data structure we can specify the ordered sequence and specify
what needs to be done to maintain the sequence. There are two common ways
of implementing lists using native data types. These are array
implementations and linked list implementations.
Array
This is the simplest representation. Each item is followed by the next and
the sequence is defined by the structure which contains the items. The
physical organization of the data structure defines the logical organization
of the items.
Linked Lists
In this representation, the structure does not necessarily reflect the
logical organization. Items may appear in any order. The logical
organization is provided through pointers. Each item in the list, called a
node, consists of a data portion containing the item information and a
pointer which contains the location of the next item in the logical sequence
which is to be maintained.
It is actually possible to have the advantages of the linked list by using
arrays, but in this case two kinds of information must be stored in arrays,
the information contained in the item and the pointer for that item.
Suppose that F, A, C, G are to be stored in an array which can hold a maximum
of 5 items. They are to be stored in the order in which they are read but
they should logically be available in alphabetical order.
The array must be set up first. Then some counter to keep track of which
slot is available (i.e.. empty) is required. A link field is required for
two things. It can specify the next index which is available after the
current available space. It can also specify the logical order of the items
which currently occupy space.
It is also possible to delete and reuse space using this method. However, a
disadvantage is that an array which is large enough to hold the longest list
must be declared which may result in a large waste of memory in many cases.
Singly linked lists
We will examine the singly-linked list as an abstract data type. Remember that
the abstraction is an advantage to the user but the implementor must deal with
all of the subprocesses which are abstracted for the user. Thus the user sees
five processes but the implementor must deal with all of the small processes
which are used by the five major processes which define the type. Whether the
implementation uses arrays or dynamic allocation is not relevent to the user
but it is important for the implementor. We'll look at each of the processes.
The create process is essentially an initialization process for the user. It
sets up the program so that a list can be created. For the implementor this
means that the array must be defined and initialized and the variable
indicating the first open position set if it is an array implementation. If
dynamic allocation is used, the node structure must be defined.
Obviously the user can request the create process at any
time. Then the old list is lost (or saved if the software allows) and the
intialization done. Creation is an O(n) process for arrays, where n is the
maximum length of the list (i.e. array size) but an O(1) operation for
dynamic allocation.
Length is usually implemented by having a variable which contains the length
of the list. Note that this implementation requires that insert and delete
also modify the length of the list. For array implementations this is
usually stored as an integer variable. For dynamic allocation, it may be
stored as an integer variable as well and is stored in the list node for the list.
In order to find a specified item in a list the retrieve process is used.
This is essentially a search process. In a singly-linked list, all items
are accessed by beginning at the front or head of the list and going through
until the required position is found. Retrieve does not need to know how
the list is ordered (i.e. ascending, frequency, etc.) since the structure is
built so that each item follows the next according to some ordering
relationship. The implementation may be diverse enough so that we can
retrieve an item by its key or as the nth element. In this case the
retrieve has additional parameters. Retrieve is O(n) in the worst case
since the last item may be the one which is required or the item does not
exist in the list. n in this case is the number of items in the list, not
the maximum possible number of items. We could make retrieve more efficient
at indicating that an item does not exist, but in doing so we also remove
some of the generality of the data type. Only one procedure should require
modification when we change the ordering relationship (see insertion). If
we also incorporate the same code in retrieve so that the search can stop
when the item position is passed, we limit retrieve to that kind of ordered
list. Note also that retrieve from the user point of view is transparent of
the data structure. Whether an array or a series of nodes is accessed is
not important to the user.
Insert puts a new item in the list. Obviously, it is important to know the
ordering relationship in order to implement this process. Looking at the
implementation more carefully, there are several steps. We have to get an
open space in the array implementation or create a new node for dynamically
allocated nodes. The information is then placed in the newly acquired
position. Finding the correct position for the new item in the list is
dependent on the ordering relationship. Thus if we change the type of
linked list this procedure must be changed, or alternately, we can pass a
flag to the procedure which uses the portion of the code for that ordering
relationship. Having found the position, the insert procedure now sets the
links so that the new item is in the appropriate position with respect to
the list ordering. The links determine the appropriate position not the
position physically occupied. Note that insertion at the head of the list
is a special case which must be dealt with. Most often the array
implementation has the link field (see previous) so that data need not be
moved each time a new item is inserted. The insertion itself is O(1) but
the process of finding the position may be O(n) where n is the number of
items in the list. This makes the total process O(n) in the worst case,
O(1) in the best case and average of n/2.
Delete also requires more than one step. The item must be found, the links
reset so that the integrity of the list is maintained, and the old item
destroyed. Finding the item can be done with the same procedure that finds
node position in insert. Retrieve cannot be used since the preceeding node
must also be known and this is not returned with retrieve but must be
returned with the findposition procedure. Having found the node, the links
are reset to maintain the list, and finally the node destroyed. Once again,
deletion at the head of the list is a special case. In the array
implementation, the position must be placed back as available space, whereas
it is returned to the memory pool for dynamic allocation. Order for
deletion is the same as insertion.
Circular Linked Lists
In a singly linked list, searches start at the current position and proceed
to the end. If the whole list is to be searched from the beginning, this is
not a real problem. However sometimes there are advantages to being able to
start at the current point and go through the whole list.
Generally, this is the case when the ordering sequence in which the items
appear is important but the logical relationships are not. For example, a
processor queue keeps giving a time slice to each individual in a sequence
in which the list was made. New ones are added at the current point but
when it gets to the end of the list it starts over.
A circular linked list is one in which the last node of the list points to
the beginning node. Therefore, the whole list can be processed beginning at
any point and processing until the starting point is reached again.
While searches, insertions and deletions are often completed more quickly, the
worst case is still O(n).
Doubly Linked Lists
One of the major disadvantages of a singly linked list is the inability to
move backward in the list. The pointers for any node only tell what comes
after. Therefore, if a node is to be inserted before the current
position, it can't be done because the reference to the current node is not
available. There is a way to cope with this but it involves changing the
data items stored in two nodes.
A better way is to doubly link the list. Each node then has two references,
one which refers to the node before the current node and one which refers to
the node which follows the current node. It is then possible to move
forward or backward in the list and to insert before or after the current
node. Note however that memory requirements are now increased in order to
do this and the algorithms must cope with two references for each node. Once
again, some processes can be accomplished more quickly but worst case is
still O(n). Note as well, that the dynamic allocation has a node with a
forward and backward references which can still be represented in the array
implementation. However, we must now have two link fields in the array
rather than one and deal with both fields on insertion and deletion.
Sparse Matrices as Linked Lists
Sparse matrices are those in which most of the values do not exist or are not
important in any of the calculations (e.g. the value is 0). Most of the memory
is eaten up storing irrelevant data. By using a linked list it is possible to store
only the values required.
The linked list implementation requires that the ADT really have a
2-dimensional aspect. There must be ways to point to rows and columns and
there must be a way to link the actual data. Therefore, two different node
structures are required.
One possible way is to have one set of nodes used to define rows and
another to define the columns. Each row node has a data element indicating
the row number and two links. One link points to the next row, the other points
to the first node in the row which contains matrix information. Similarly, the
same structure is used for the column nodes. A row reference points to the
first row node and a column reference points to the first column node.
The other structure is a node in which the data element contains three
parts, the value of the element in the matrix, its row location, and its
column location. These nodes also each have two references. One
points to the next node in the same column, the other to the next node in
the same row. The list then actually works to provide the structure and the
information contained in the sparse matrix.
A disadvantage of having the row and column nodes structured as a
singly-linked list is that there is not random access to a particular row or
column. Therefore, an array of references often replaces the row list and the
same type of structure replaces the column list. The references point to the
first data element. While this gives random access to the first data
element in either a row or a column, the maximum size of the matrix is fixed
by the size of these reference arrays.
If we use a sparse matrix to do a matrix multiply, the order or the
algorithm is still O(n3). If a regular matrix of size 100 by 100 is used,
approximately 1000000 operations are required. If the matrix is sparse,
most of the multiplies and adds will involve zero. The sparse matrix does
not contain the zeros so the number may be radially reduced. For example if
the sparse matrix really only contains 100 actual data values, an estimate of
the number of actual multiplies is 10 cubed or 1000. The order
of the algorithm does not change but the worst case estimate is very much
larger and a better estimate is really to use the square root of n as the number
for the estimate since the work done is really the square root of n taken to the
third power.