Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LISTS Data structures are defined by the processes which are required to produce and maintain them. This makes them prime candidates for objects implemented with object oriented languages. In their ideal form, the processes work with generic data or with data that is specified at the time that the data structure is produced and used. Thus it should not matter whether the data is a single number or a collection of large records with many kinds of data. The combination of specifying the structure by the processes and the data That are used by the processes makes an ABSTRACT DATA TYPE (ADT). The following 5 operations define an ADT known as an ordered list. 1. Create - an empty ordered list 2. Length - returns the number of items currently in the list 3. Insert - insert an item into the list 4. Delete - delete an item from a list 5. Retrieve - return the value of the specified item in the list Notice that the algorithms required to define the list are not concerned with how the list is implemented. It may be done with an array or dynamically allocated as each part is required ( linked-list). Some of the operations are concerned with the maintenance of the ADT and others with the accessing of information but both are required for the definition. For example, if the language used already understands the relationships among the digits in order to produce integers, the task of implementation is simplified and the relational operators already allow us to determine whether one key is larger than another. If not, then the processes which are required to build the relationships are built into the code. For example, finding operators which will determine the relationships among enumerated types used as keys in a data structure. They do not change the specification of the processes which are necessary to make the list. Lists are a collection of items in which each item has a specific position. The specification for positioning the items provides some rules of order so this data structure is called an ordered list. Some common ordered lists are: * chronologically ordered- in which items are inserted in the order in which they arrive for insertion * frequency ordered- in which the items used most often are closest to the beginning of the list * sorted- in which the items are placed in ascending or descending order. By choosing a data structure we can specify the ordered sequence and specify what needs to be done to maintain the sequence. There are two common ways of implementing lists using native data types. These are array implementations and linked list implementations. Array This is the simplest representation. Each item is followed by the next and the sequence is defined by the structure which contains the items. The physical organization of the data structure defines the logical organization of the items. Linked Lists In this representation, the structure does not necessarily reflect the logical organization. Items may appear in any order. The logical organization is provided through pointers. Each item in the list, called a node, consists of a data portion containing the item information and a pointer which contains the location of the next item in the logical sequence which is to be maintained. It is actually possible to have the advantages of the linked list by using arrays, but in this case two kinds of information must be stored in arrays, the information contained in the item and the pointer for that item. Suppose that F, A, C, G are to be stored in an array which can hold a maximum of 5 items. They are to be stored in the order in which they are read but they should logically be available in alphabetical order. The array must be set up first. Then some counter to keep track of which slot is available (i.e.. empty) is required. A link field is required for two things. It can specify the next index which is available after the current available space. It can also specify the logical order of the items which currently occupy space. It is also possible to delete and reuse space using this method. However, a disadvantage is that an array which is large enough to hold the longest list must be declared which may result in a large waste of memory in many cases. Singly linked lists We will examine the singly-linked list as an abstract data type. Remember that the abstraction is an advantage to the user but the implementor must deal with all of the subprocesses which are abstracted for the user. Thus the user sees five processes but the implementor must deal with all of the small processes which are used by the five major processes which define the type. Whether the implementation uses arrays or dynamic allocation is not relevent to the user but it is important for the implementor. We'll look at each of the processes. The create process is essentially an initialization process for the user. It sets up the program so that a list can be created. For the implementor this means that the array must be defined and initialized and the variable indicating the first open position set if it is an array implementation. If dynamic allocation is used, the node structure must be defined. Obviously the user can request the create process at any time. Then the old list is lost (or saved if the software allows) and the intialization done. Creation is an O(n) process for arrays, where n is the maximum length of the list (i.e. array size) but an O(1) operation for dynamic allocation. Length is usually implemented by having a variable which contains the length of the list. Note that this implementation requires that insert and delete also modify the length of the list. For array implementations this is usually stored as an integer variable. For dynamic allocation, it may be stored as an integer variable as well and is stored in the list node for the list. In order to find a specified item in a list the retrieve process is used. This is essentially a search process. In a singly-linked list, all items are accessed by beginning at the front or head of the list and going through until the required position is found. Retrieve does not need to know how the list is ordered (i.e. ascending, frequency, etc.) since the structure is built so that each item follows the next according to some ordering relationship. The implementation may be diverse enough so that we can retrieve an item by its key or as the nth element. In this case the retrieve has additional parameters. Retrieve is O(n) in the worst case since the last item may be the one which is required or the item does not exist in the list. n in this case is the number of items in the list, not the maximum possible number of items. We could make retrieve more efficient at indicating that an item does not exist, but in doing so we also remove some of the generality of the data type. Only one procedure should require modification when we change the ordering relationship (see insertion). If we also incorporate the same code in retrieve so that the search can stop when the item position is passed, we limit retrieve to that kind of ordered list. Note also that retrieve from the user point of view is transparent of the data structure. Whether an array or a series of nodes is accessed is not important to the user. Insert puts a new item in the list. Obviously, it is important to know the ordering relationship in order to implement this process. Looking at the implementation more carefully, there are several steps. We have to get an open space in the array implementation or create a new node for dynamically allocated nodes. The information is then placed in the newly acquired position. Finding the correct position for the new item in the list is dependent on the ordering relationship. Thus if we change the type of linked list this procedure must be changed, or alternately, we can pass a flag to the procedure which uses the portion of the code for that ordering relationship. Having found the position, the insert procedure now sets the links so that the new item is in the appropriate position with respect to the list ordering. The links determine the appropriate position not the position physically occupied. Note that insertion at the head of the list is a special case which must be dealt with. Most often the array implementation has the link field (see previous) so that data need not be moved each time a new item is inserted. The insertion itself is O(1) but the process of finding the position may be O(n) where n is the number of items in the list. This makes the total process O(n) in the worst case, O(1) in the best case and average of n/2. Delete also requires more than one step. The item must be found, the links reset so that the integrity of the list is maintained, and the old item destroyed. Finding the item can be done with the same procedure that finds node position in insert. Retrieve cannot be used since the preceeding node must also be known and this is not returned with retrieve but must be returned with the findposition procedure. Having found the node, the links are reset to maintain the list, and finally the node destroyed. Once again, deletion at the head of the list is a special case. In the array implementation, the position must be placed back as available space, whereas it is returned to the memory pool for dynamic allocation. Order for deletion is the same as insertion. Circular Linked Lists In a singly linked list, searches start at the current position and proceed to the end. If the whole list is to be searched from the beginning, this is not a real problem. However sometimes there are advantages to being able to start at the current point and go through the whole list. Generally, this is the case when the ordering sequence in which the items appear is important but the logical relationships are not. For example, a processor queue keeps giving a time slice to each individual in a sequence in which the list was made. New ones are added at the current point but when it gets to the end of the list it starts over. A circular linked list is one in which the last node of the list points to the beginning node. Therefore, the whole list can be processed beginning at any point and processing until the starting point is reached again. While searches, insertions and deletions are often completed more quickly, the worst case is still O(n). Doubly Linked Lists One of the major disadvantages of a singly linked list is the inability to move backward in the list. The pointers for any node only tell what comes after. Therefore, if a node is to be inserted before the current position, it can't be done because the reference to the current node is not available. There is a way to cope with this but it involves changing the data items stored in two nodes. A better way is to doubly link the list. Each node then has two references, one which refers to the node before the current node and one which refers to the node which follows the current node. It is then possible to move forward or backward in the list and to insert before or after the current node. Note however that memory requirements are now increased in order to do this and the algorithms must cope with two references for each node. Once again, some processes can be accomplished more quickly but worst case is still O(n). Note as well, that the dynamic allocation has a node with a forward and backward references which can still be represented in the array implementation. However, we must now have two link fields in the array rather than one and deal with both fields on insertion and deletion. Sparse Matrices as Linked Lists Sparse matrices are those in which most of the values do not exist or are not important in any of the calculations (e.g. the value is 0). Most of the memory is eaten up storing irrelevant data. By using a linked list it is possible to store only the values required. The linked list implementation requires that the ADT really have a 2-dimensional aspect. There must be ways to point to rows and columns and there must be a way to link the actual data. Therefore, two different node structures are required. One possible way is to have one set of nodes used to define rows and another to define the columns. Each row node has a data element indicating the row number and two links. One link points to the next row, the other points to the first node in the row which contains matrix information. Similarly, the same structure is used for the column nodes. A row reference points to the first row node and a column reference points to the first column node. The other structure is a node in which the data element contains three parts, the value of the element in the matrix, its row location, and its column location. These nodes also each have two references. One points to the next node in the same column, the other to the next node in the same row. The list then actually works to provide the structure and the information contained in the sparse matrix. A disadvantage of having the row and column nodes structured as a singly-linked list is that there is not random access to a particular row or column. Therefore, an array of references often replaces the row list and the same type of structure replaces the column list. The references point to the first data element. While this gives random access to the first data element in either a row or a column, the maximum size of the matrix is fixed by the size of these reference arrays. If we use a sparse matrix to do a matrix multiply, the order or the algorithm is still O(n3). If a regular matrix of size 100 by 100 is used, approximately 1000000 operations are required. If the matrix is sparse, most of the multiplies and adds will involve zero. The sparse matrix does not contain the zeros so the number may be radially reduced. For example if the sparse matrix really only contains 100 actual data values, an estimate of the number of actual multiplies is 10 cubed or 1000. The order of the algorithm does not change but the worst case estimate is very much larger and a better estimate is really to use the square root of n as the number for the estimate since the work done is really the square root of n taken to the third power.