Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PRINCIPLES OF COMPUTER SCIENCE MODULE 15 - DATA STRUCTURES - I ARRAYS AND LISTS The objectives of this module are: 1. To be familiar with the data arrangements within the computer. 2. To get an idea about how the internal storage is abstracted from the user. 3. To think the data structures in terms of abstract tools. 4. To understand the implementation of arrays, lists and stacks. 5. To have a knowledge about the various operations on arrays and lists. 15.1. Introduction: Data refers to value or sets of values. Collection of data is frequently organized into fields, records and files. The logical model of a particular organization of data is called a data structure. Earlier we have learned that high level languages provide techniques by which programmers can easily interpret the algorithm as the data being manipulated were stored in certain data structures commonly used as primitive structures. The selection of a model or data structure depends on two factors. First, the efficiency of the structure to reflect the actual relationship of data in the real world. Second, the structure should be simple enough that the data can be processed easily whenever necessary. 15.2. Basic Data Structures: 1 There are different data structures used for various purposes. According to the applications and usage of stored data, some data structures are more useful. An array is appropriate for storing related data items and to manipulate them. A list is a collection of entries arranged sequentially. By restricting the access of list entries, the lists can be either a stack or a queue. Another type called tree is a collection of entries having hierarchical organization. Each of these commonly used structures is discussed here in detail. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXX 15.3. Arrays: The simplest type of data structures is a one dimensional array. It is a list of a finite number of data elements. A homogeneous array is a rectangular block of data with entries of same type. The particular entries are identified with their indices to denote the position. Two dimensional homogeneous arrays consist of rows and columns in which positions are identified by pairs of indices where the first index represents the row and the second index represents the column. If the name given to an array is A, the elements of A are denoted as A[1][1], A[1][2],………..A[m][n]. Example, A linear array Student to store the names of five students is given below. Student 1 Tom 2 Mary 3 Alan 4 Asin 2 5 Arjun Here Student[1] denotes John, Student[2] denotes Mary etc. Linear arrays are otherwise known as one-dimensional arrays and each element in the array is referenced by one subscript. The number of elements stored in the array is called the length or the size of the array. A two dimensional array is an array of one dimensional arrays or a collection of similar data items and each item is references by two subscripts. For example, a rectangular array of numbers representing the monthly sales made by members of a sales force in which each row represents the monthly sales made by a particular member and each column representing the sales by each member for a particular month. Hence the entry in the first row and fifth column represents the sales made by the first sales person in May. Each programming language has its own rules for declaring arrays. However, each declaration must provide the following information. The name of the array The data type of the array The index set or the size of the array Some programming languages allocate memory space for arrays statically during compilation and some other allocate memory dynamically by reading the size of the array at runtime. A heterogeneous array is a block of data items of different types. The items are usually called components. An example is a heterogeneous array Student with components Student _Reg No (type int), name (type char), marks (type float) and grade (type char). 15.3.1. Homogeneous Arrays: Suppose we want to store a sequence of 25 numbers that are divisible by 8, each of which requires one memory cell of storage space. Also we want to 3 manipulate the sequence or access it as the first number or the third number etc. this can be achieved by storing the readings in a sequence of 25 memory cells with consecutive addresses. This technique is used by most translators of high level programming languages to implement one dimensional homogeneous array. When a statement like int Number[25]; is encountered by a translator, it will arrange 25 consecutive memory cells to be occupied. Hence the statement Number[5] =40 instructs the number 40 to be placed in the array at 5th position. Most of the higher level languages allow the array indices to be start at 0 rather than 1, which means that the Number[5] refers to the 6th position of the array if it is implemented using any high level language. Assume, our aim is to record the sales made by the sales executives of a particular company for a 10 days period. Here the data should be arranged in a two dimensional homogeneous array where each row indicates the sales made by a particular sales executive and the columns represent all the sales made during a particular day. In this case, the array is static and we can calculate the amount of memory needed for the entire array in advance and reserve a block of contiguous memory cells for that array. The starting address of the cell is used to find out the address of each entry in the array with the corresponding row and column value. The expression (c x (i-1))+(j-1) is called the address polynomial where ‘c’ is the number of entries in each row, ‘i’ is the row and ‘j’ is the column. This is added with the starting address of the entry in the ith row and jth column. 15.3.2. Heterogeneous Arrays : 4 Sometimes the array needs to store the values of different types such as the details of employees. It may consist of name of type char, age of type integer, skill_rate of type real. If the number of memory cells required by each component is fixed, we can store the array in a block of contiguous cells. For the above specification, assume that we require 25 cells for name, 2 cells for age and 4 cells for skill_rate, then we can reserve 31 contiguous cells where the first 2 will be occupied by the name, next 2 by age and the last 4 by skill rate. With this arrangement, the different components can be accessed easily. If the first cell is addressed as x, Employee.name would translate the first 25 cells, Employee.age would translate 2 cells starting at x+25 and Employee.skill_rate would translate the 4 memory cells starting at x+27. Another way to store a heterogeneous array in a block of contiguous memory cells is to store each component in a separate memory location and then link them together by means of pointer. This arrangement is used if the size of the array’s component is dynamic. 15.3.3. Operations on Arrays: There are different operations possible on arrays. The most commonly used operations performed on arrays are listed below. Insertion:- Adding new elements to an array Deletion:- Removing an item from an array Traversal:- Processing each element in the array Search:- Finding the location of a given item Sorting:- Arranging elements in some specified order Merging:- combining two arrays into a single array Reversing:- Reversing the order of elements in the array 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXX 15.4. Lists: List is one of the most popularly used data structures for computing and counting. The performance and flexibility of the lists are higher than any other data structure. We can use an array or a list to store similar data in memory. But arrays have the limitations such as the need of contiguous memory locations and the difficulty in insertion and deletion of array elements. Linked lists overcome all these difficulties. It can grow and shrink in size during its life time and there is no maximum size. Since the elements or nodes are stored at different memory locations, less chance for short of memory when required. In general, a list means a linear collection of data items. It may be a shopping list or a list of books. As mentioned above, if arrays are used for storing lists, it will be a contiguous list. Though it is convenient for storing static lists, deletion and insertion of names causes time consuming shuffling of entries. In the worst case, the addition of entries arise the need to move the entire list to a new location to obtain an available block of cells that could accommodate the expanded list. But in the case of a linked list, individual entries are stored in different location of memory rather than together in a large contiguous block. Even though the elements are scattered, they are bound together by explicit links between them. This linkage system gives the name linked list. To keep track of the beginning of a linked list, a pointer is used which save the address of the first entry. Since this pointer points to the beginning or 6 head of the list, it is called head pointer. To mark the end of a linked list, we use a NULL pointer (NIL pointer) which is a special bit pattern placed in the pointer cell of the last entry to indicate that no further entries are there in the list. Hence each element of linked list is called a node each of which stores two items of information – an element of the list in the data part and a link in the pointer part or link part. The final linked list structure is represented by the diagram in which we show the scattered blocks of memory used for the list by individual rectangles. Each rectangle is labeled to indicate its composition. Each pointer is represented as arrow from the pointer itself to the next node. Traversing a list involve following the head pointer to find the first entry. From there, following the pointers stored with the entries to hop from one node to the next until the NULL pointer is reached. HEAD Data Data Link Link NULL Data Link Figure : Linked List A linked list to store a string in memory is shown here. Each node of the list stores a single character. To obtain the actual string we have to follow the link. START 8 Data Link 7 1 2 3 O 0 4 E 6 L 10 H 4 5 6 7 8 9 10 START=8, L so LINK[8]=4, so 11 DATA[8]=H 3 First Character DATA[4]=E Second Character LINK[4]=6, so DATA[6]=L Third Character LINK[6]=10, so DATA[10]=L Fourth Character LINK[10]=3, so DATA[3]=O Fifth Character LINK[3]=0, the NULL value, END of list. The string is HELLO There are several operations that can be performed on linked list. A linked list can be extended by adding new nodes at the beginning, in the middle or at the end. The nodes in the list can be deleted without affecting other nodes seriously. Further, the nodes can be sorted in ascending or descending order. In C++, we can use a structure to denote a node containing a data part and a link part. A variable P is to be declared as pointer to a node. This pointer is used as pointer to the first node in the list. Many nodes can be added to the linked list, but P would continue to point the first node. When no nodes are added to the list, P will be NULL. 8 To illustrate the advantages of linked list over a contiguous structure the insertion and deletion operations can be considered. The task of deleting an entry results in a hole in contiguous allocation, which means that, the entries following the deleted one should be moved forward to make the list contiguous. But in the case of linked list, an entry can be deleted by changing a single pointer. The pointer pointing to the deleted node is modified so as to point to the node following the deleted node. Hence during traversal, the deleted entry is passed by because it no longer is part of the chain. Figure : Deletion of a node from a linked list HEAD Data Link Old Pointer Data Link NULL New Pointer Data Link Inserting a new entry involves finding out an unused memory block large enough to hold the new entry and a pointer and to fill the link part of the new entry with the address of the entry in the list that should follow the new entry. Then change the pointer of the entry that should precede the new entry so as to point to the new entry. Hence when the list is traversed, the new entry will be in the proper place. Figure : Insertion of a new node into a linked list 9 HEAD New Entry Data New Pointer Link New Pointer Old Pointer Data Data Link Link NULL Data Link 15.4.1. Circular Linked List: The linked list discussed above is of linear nature and is called a linear linked list. The elements of such linked list are accessed by, first setting a pointer to the first node in the list and then traversing the entire list using this pointer. A linear linked list is useful in several ways but has the disadvantage that, if a pointer P to a node is given in a linear list, it is not possible to reach any node that precede the node to which P is pointing. This problem can be eliminated by making a small change. That is, the link or pointer of the last node can contains a pointer to the first node instead of a NULL. This arrangement of the list gives the name circular list. The structure of a circular linked list is given below. Figure: A Circular Linked List 10 From any point in a circular list it is possible to reach any other point in the list. A circular linked list does not have a first or last node. A circular linked list can be used to represent a stack and a queue. 15.4.2. Doubly Linked List: Previously mentioned lists contain nodes providing information about the next node in the list. It does not have any knowledge about where the previous node lies in memory. If we are at the 17th node in the list, then to reach at the 16th node, we need to traverse the list right from the first node. To make it easy, we can store in each node the address of the next node along with the address of the previous node. This arrangement is often referred to as Double Linked List and is shown below. Prev Data Next Node Figure : Node of a Doubly Linked List N 20 200 700 200 18 400 700 700 15 600 400 400 28 300 600 Figure : An example for a Doubly Linked List A doubly linked list is implemented in C++ with a structure with one data part and two pointer parts. One for storing the address of the next node and the other for storing the address of the previous node. One of the greatest applications of linked list is to store polynomials. Polynomials like 8x4 + 3x3 – 6x2 + 15x -23 can be stored using a linked list. To do this, each node should consist of three elements such as coefficient, 11 exponent and link to the next node. Here it is assumed that exponent of each successive term is less than that of the previous term. Once a linked list is build to represent a polynomial, that list can be used to perform common polynomial operations like addition, subtraction and multiplication. For example, To perform addition of two polynomials contained in poly1 and ploy2, these are traversed from start till the end of one of them is reached. During the traversal, these are compared on term by term basis. If the exponent of the two terms being compared are equal, then their coefficient are added and stored in the third polynomial poly3. If the exponents of the two terms are not equal, then the term with the bigger exponent is added to the third polynomial. During the traversal, if the end of one of the list is reached, the remaining terms of the other polynomial whose end has not been reached yet are simply appended to the third polynomial. The result is displayed when the third polynomial poly3 is displayed. 12