Download Academic Script

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Binary search tree wikipedia , lookup

Quadtree wikipedia , lookup

B-tree wikipedia , lookup

Array data structure wikipedia , lookup

Linked list wikipedia , lookup

Transcript
PRINCIPLES OF COMPUTER SCIENCE
MODULE 15
-
DATA STRUCTURES - I
ARRAYS AND LISTS
The objectives of this module are:
1. To be familiar with the data arrangements within the computer.
2. To get an idea about how the internal storage is abstracted from the
user.
3. To think the data structures in terms of abstract tools.
4. To understand the implementation of arrays, lists and stacks.
5. To have a knowledge about the various operations on arrays and lists.
15.1.
Introduction:
Data refers to value or sets of values. Collection of data is frequently
organized into fields, records and files. The logical model of a particular
organization of data is called a data structure. Earlier we have learned that
high level languages provide techniques by which programmers can easily
interpret the algorithm as the data being manipulated were stored in certain
data structures commonly used as primitive structures. The selection of a
model or data structure depends on two factors.
First, the efficiency of the structure to reflect the actual relationship of
data in the real world.
Second, the structure should be simple enough that the data can be
processed easily whenever necessary.
15.2.
Basic Data Structures:
1
There are different data structures used for various purposes. According
to the applications and usage of stored data, some data structures are more
useful. An array is appropriate for storing related data items and to manipulate
them. A list is a collection of entries arranged sequentially. By restricting the
access of list entries, the lists can be either a stack or a queue. Another type
called tree is a collection of entries having hierarchical organization. Each of
these commonly used structures is discussed here in detail.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXX
15.3.
Arrays:
The simplest type of data structures is a one dimensional array. It is a
list of a finite number of data elements. A homogeneous array is a rectangular
block of data with entries of same type. The particular entries are identified
with their indices to denote the position. Two dimensional homogeneous arrays
consist of rows and columns in which positions are identified by pairs of
indices where the first index represents the row and the second index
represents the column. If the name given to an array is A, the elements of A
are denoted as A[1][1], A[1][2],………..A[m][n].
Example,
A linear array Student to store the names of five students is given below.
Student
1
Tom
2
Mary
3
Alan
4
Asin
2
5
Arjun
Here Student[1] denotes John, Student[2] denotes Mary etc.
Linear arrays are otherwise known as one-dimensional arrays and each
element in the array is referenced by one subscript. The number of elements
stored in the array is called the length or the size of the array.
A two dimensional array is an array of one dimensional arrays or a
collection of similar data items and each item is references by two subscripts.
For example, a rectangular array of numbers representing the monthly sales
made by members of a sales force in which each row represents the monthly
sales made by a particular member and each column representing the sales by
each member for a particular month. Hence the entry in the first row and fifth
column represents the sales made by the first sales person in May.
Each programming language has its own rules for declaring arrays.
However, each declaration must provide the following information.

The name of the array

The data type of the array

The index set or the size of the array
Some programming languages allocate memory space for arrays statically
during compilation and some other allocate memory dynamically by reading
the size of the array at runtime.
A heterogeneous array is a block of data items of different types. The
items are usually called components. An example is a heterogeneous array
Student with components Student _Reg No (type int), name (type char), marks
(type float) and grade (type char).
15.3.1. Homogeneous Arrays:
Suppose we want to store a sequence of 25 numbers that are divisible by
8, each of which requires one memory cell of storage space. Also we want to
3
manipulate the sequence or access it as the first number or the third number
etc. this can be achieved by storing the readings in a sequence of 25 memory
cells with consecutive addresses.
This technique is used by most translators of high level programming
languages to implement one dimensional homogeneous array. When a
statement like
int Number[25];
is encountered by a translator, it will arrange 25 consecutive memory
cells to be occupied.
Hence the statement Number[5] =40 instructs the number 40 to be placed
in the array at 5th position. Most of the higher level languages allow the array
indices to be start at 0 rather than 1, which means that the Number[5] refers to
the 6th position of the array if it is implemented using any high level language.
Assume, our aim is to record the sales made by the sales executives of a
particular company for a 10 days period. Here the data should be arranged in a
two dimensional homogeneous array where each row indicates the sales made
by a particular sales executive and the columns represent all the sales made
during a particular day.
In this case, the array is static and we can calculate the amount of
memory needed for the entire array in advance and reserve a block of
contiguous memory cells for that array.
The starting address of the cell is used to find out the address of each
entry in the array with the corresponding row and column value. The
expression (c
x
(i-1))+(j-1) is called the address polynomial where ‘c’ is the
number of entries in each row, ‘i’ is the row and ‘j’ is the column. This is added
with the starting address of the entry in the ith row and jth column.
15.3.2. Heterogeneous Arrays :
4
Sometimes the array needs to store the values of different types such as
the details of employees. It may consist of name of type char, age of type
integer, skill_rate of type real. If the number of memory cells required by each
component is fixed, we can store the array in a block of contiguous cells. For
the above specification, assume that we require 25 cells for name, 2 cells for
age and 4 cells for skill_rate, then we can reserve 31 contiguous cells where the
first 2 will be occupied by the name, next 2 by age and the last 4 by skill rate.
With this arrangement, the different components can be accessed easily.
If the first cell is addressed as x, Employee.name would translate the first 25
cells,
Employee.age
would
translate
2
cells
starting
at
x+25
and
Employee.skill_rate would translate the 4 memory cells starting at x+27.
Another way to store a heterogeneous array in a block of contiguous
memory cells is to store each component in a separate memory location and
then link them together by means of pointer. This arrangement is used if the
size of the array’s component is dynamic.
15.3.3. Operations on Arrays:
There are different operations possible on arrays. The most commonly
used operations performed on arrays are listed below.

Insertion:- Adding new elements to an array

Deletion:- Removing an item from an array

Traversal:- Processing each element in the array

Search:- Finding the location of a given item

Sorting:- Arranging elements in some specified order

Merging:- combining two arrays into a single array

Reversing:- Reversing the order of elements in the array
5
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXX
15.4.
Lists:
List is one of the most popularly used data structures for computing and
counting. The performance and flexibility of the lists are higher than any other
data structure. We can use an array or a list to store similar data in memory.
But arrays have the limitations such as the need of contiguous memory
locations and the difficulty in insertion and deletion of array elements.
Linked lists overcome all these difficulties. It can grow and shrink in size
during its life time and there is no maximum size. Since the elements or nodes
are stored at different memory locations, less chance for short of memory when
required.
In general, a list means a linear collection of data items. It may be a
shopping list or a list of books. As mentioned above, if arrays are used for
storing lists, it will be a contiguous list. Though it is convenient for storing
static lists, deletion and insertion of names causes time consuming shuffling of
entries. In the worst case, the addition of entries arise the need to move the
entire list to a new location to obtain an available block of cells that could
accommodate the expanded list.
But in the case of a linked list, individual entries are stored in different
location of memory rather than together in a large contiguous block. Even
though the elements are scattered, they are bound together by explicit links
between them. This linkage system gives the name linked list.
To keep track of the beginning of a linked list, a pointer is used which
save the address of the first entry. Since this pointer points to the beginning or
6
head of the list, it is called head pointer. To mark the end of a linked list, we
use a NULL pointer (NIL pointer) which is a special bit pattern placed in the
pointer cell of the last entry to indicate that no further entries are there in the
list. Hence each element of linked list is called a node each of which stores two
items of information – an element of the list in the data part and a link in the
pointer part or link part.
The final linked list structure is represented by the diagram in which we
show the scattered blocks of memory used for the list by individual rectangles.
Each rectangle is labeled to indicate its composition. Each pointer is
represented as arrow from the pointer itself to the next node. Traversing a list
involve following the head pointer to find the first entry. From there, following
the pointers stored with the entries to hop from one node to the next until the
NULL pointer is reached.
HEAD
Data
Data
Link
Link
NULL
Data
Link
Figure : Linked List
A linked list to store a string in memory is shown here. Each node of the
list stores a single character. To obtain the actual string we have to follow the
link.
START
8
Data
Link
7
1
2
3
O
0
4
E
6
L
10
H
4
5
6
7
8
9
10
START=8,
L so
LINK[8]=4,
so
11
DATA[8]=H
3
First Character
DATA[4]=E
Second Character
LINK[4]=6, so
DATA[6]=L
Third Character
LINK[6]=10, so
DATA[10]=L
Fourth Character
LINK[10]=3, so
DATA[3]=O
Fifth Character
LINK[3]=0, the NULL value, END of list.
The string is HELLO
There are several operations that can be performed on linked list. A
linked list can be extended by adding new nodes at the beginning, in the
middle or at the end. The nodes in the list can be deleted without affecting
other nodes seriously. Further, the nodes can be sorted in ascending or
descending order.
In C++, we can use a structure to denote a node containing a data part
and a link part. A variable P is to be declared as pointer to a node. This pointer
is used as pointer to the first node in the list. Many nodes can be added to the
linked list, but P would continue to point the first node. When no nodes are
added to the list, P will be NULL.
8
To illustrate the advantages of linked list over a contiguous structure the
insertion and deletion operations can be considered. The task of deleting an
entry results in a hole in contiguous allocation, which means that, the entries
following the deleted one should be moved forward to make the list contiguous.
But in the case of linked list, an entry can be deleted by changing a single
pointer. The pointer pointing to the deleted node is modified so as to point to
the node following the deleted node. Hence during traversal, the deleted entry
is passed by because it no longer is part of the chain.
Figure : Deletion of a node from a linked list
HEAD
Data
Link
Old Pointer
Data
Link
NULL
New Pointer
Data
Link
Inserting a new entry involves finding out an unused memory block large
enough to hold the new entry and a pointer and to fill the link part of the new
entry with the address of the entry in the list that should follow the new entry.
Then change the pointer of the entry that should precede the new entry so as
to point to the new entry. Hence when the list is traversed, the new entry will
be in the proper place.
Figure : Insertion of a new node into a linked list
9
HEAD
New Entry
Data
New Pointer
Link
New Pointer
Old Pointer
Data
Data
Link
Link
NULL
Data
Link
15.4.1. Circular Linked List:
The linked list discussed above is of linear nature and is called a linear
linked list. The elements of such linked list are accessed by, first setting a
pointer to the first node in the list and then traversing the entire list using this
pointer. A linear linked list is useful in several ways but has the disadvantage
that, if a pointer P to a node is given in a linear list, it is not possible to reach
any node that precede the node to which P is pointing. This problem can be
eliminated by making a small change. That is, the link or pointer of the last
node can contains a pointer to the first node instead of a NULL. This
arrangement of the list gives the name circular list. The structure of a circular
linked list is given below.
Figure: A Circular Linked List
10
From any point in a circular list it is possible to reach any other point in
the list. A circular linked list does not have a first or last node. A circular
linked list can be used to represent a stack and a queue.
15.4.2. Doubly Linked List:
Previously mentioned lists contain nodes providing information about the
next node in the list. It does not have any knowledge about where the previous
node lies in memory. If we are at the 17th node in the list, then to reach at the
16th node, we need to traverse the list right from the first node. To make it
easy, we can store in each node the address of the next node along with the
address of the previous node. This arrangement is often referred to as Double
Linked List and is shown below.
Prev
Data
Next
Node
Figure : Node of a Doubly Linked List
N
20
200
700
200
18
400
700
700
15
600
400
400
28
300
600
Figure : An example for a Doubly Linked List
A doubly linked list is implemented in C++ with a structure with one data
part and two pointer parts. One for storing the address of the next node and
the other for storing the address of the previous node.
One of the greatest applications of linked list is to store polynomials.
Polynomials like 8x4 + 3x3 – 6x2 + 15x -23 can be stored using a linked list. To
do this, each node should consist of three elements such as coefficient,
11
exponent and link to the next node. Here it is assumed that exponent of each
successive term is less than that of the previous term. Once a linked list is
build to represent a polynomial, that list can be used to perform common
polynomial operations like addition, subtraction and multiplication.
For example,
To perform addition of two polynomials contained in poly1 and ploy2,
these are traversed from start till the end of one of them is reached. During the
traversal, these are compared on term by term basis. If the exponent of the two
terms being compared are equal, then their coefficient are added and stored in
the third polynomial poly3. If the exponents of the two terms are not equal,
then the term with the bigger exponent is added to the third polynomial.
During the traversal, if the end of one of the list is reached, the remaining
terms of the other polynomial whose end has not been reached yet are simply
appended to the third polynomial. The result is displayed when the third
polynomial poly3 is displayed.
12