Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
US03CBCA03 (Advanced Data & File Structure) Unit - I CHARUTAR VIDYA MANDAL’S SEMCOM Vallabh Vidyanagar Faculty Name: Ami D. Trivedi Class: SYBCA Subject: US03CBCA03 (Advanced Data & File Structure) *UNIT – 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want to store a group of data together in one place then array is suitable data structure. This data structure enables us to arrange more than one element that’s why it is known as composite data structure. In this data structure, all the elements are stored in contiguous location of memory. Figure-1 shows an array of data stored in a memory block starting at location 453. Figure-1 Definition: An array is a finite, ordered and collection of homogeneous data elements. Array is a finite because it contains only limited number of elements. Array is ordered because all the elements are stored one by one in contiguous location of computer memory in a linear ordered fashion. Array is collection of homogeneous (identical) elements because all the elements of an array are of the same data type only. An array is known as linear data structure because all elements of the array are stored in a linear order. e.g. 1. an array of integer to store the age of all student in a class 2. an array of string of characters to store name of all villagers in village Terminology: 1. Size: Number of elements in an array is called the size of the array. It is also known as dimension or length. 2. Type: Type of an array represents the kind of data type for which it is meant (designed). E.g. array of integer, array of character. Page 1 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I 3. Base: Base of an array is the address of memory location where the first element in the array is located. E.g. 453 is the base address of the array mentioned in Figure-1. 4. Index: All the elements in an array can be referred by a subscript like A i or A[i], this subscript is known as index. An index is always an integer value. 5. Range of indices:Indices of an array element may change from lower bound (L) to an upper bound (U). These bounds are called the boundaries of an array. If the rage of index varies from L...U then size of the array can be calculated as: Size (A) = U – L + 1 **ONE DIMENSIONAL ARRAYS A one dimensional array is one in which only one subscript specification is needed to specify a particular element of the array. Declaration of One dimensional array One dimensional array can be declared as follows: data_type var_name [expression]; data_type is the type of elements to be stored in the array. var_name specifies the name of array. It may be given any name like other simple variables. expression or subscript specifies the number of values to be stored in the array. e.g. int num[10]; This array will store ten integer values, which is specified by num. It can be visualized as below. Figure-2 Initializing one dimensional array Array variables can be initialized in declarations by constant initializers. These initializing expressions must be constant values. Expression with identifiers or function calls may not be used in the identifiers. The initializers are specified within braces and separated by commas. e.g. int ex[10] = {12, 23, 9, 17, 16, 49}; char word[10] = {‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘\0’}; Page 2 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I String initializers may be written as string constants instead of character constants within braces. e.g. char mesg[ ] = “This is a message”; char name[20] = “Hello”; In the case of mesg[ ], enough memory is allocated to accommodate the string plus a terminating NULL character and we do not need to specify the size of the array. Accessing one dimensional array elements Individual elements of the array can be accessed using the following syntax: array_name [ index or subscript ]; Example: 1. To access fourth element from array we write: num[3] The subscript for fourth element is 3 because the lower bound of array in C is 0. 2. To assign a value to second location of the array, we write: num[1] = 90; 3. To read a particular value we write: scanf (“%d”, &num[3] ) ; This statement reads a value from the keyboard and assigns it to fourth location of the array. **ADDRESS CALCULATION OF ELEMENTS OF ONE DIMENSIONAL ARRAYS Memory allocation for a one dimensional array Suppose an array A[100] is to be stored in a memory as in Figure-3. Let the memory location where the first element can be stored is M. Figure-3 Physical Representation of a one-dimensional array An array can be written as A[L..U] where L denotes lower bound and U denotes the upper bound for index. Page 3 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I *TO FIND ADDRESS OF AN ELEMENT IN ONE DIMENSIONAL ARRAY WITH LOWER BOUND = 1 If an element in one dimensional array occupy 1 word (byte): If each element requires one word (byte) then the location for any element say A[ i ] in the array can be obtained as: Address ( A[ i ] ) = M + ( i – 1 ) M i Note: Lower bound of array is assumed to be 1. Base address of array Subscript of an element whose address is required to be calculated If an element in one dimensional array occupy c word (byte): Address ( A[ i ] ) = M + ( i – 1 ) * c M i c Note: Lower bound of array is assumed to be 1. Base address of array Subscript of an element whose address is required to be calculated Size of an element OR LOC( Ai ) = L0 + ( i –1 ) * c L0 i c Note: Lower bound of array is assumed to be 1. Base address of array Subscript of an element whose address is required to be calculated Size of an element *TO FIND ADDRESS OF AN ELEMENT IN ONE DIMENSIONAL ARRAY FOR ANY VALUE OF LOWER BOUND) METHOD - 1 If array is stored starting from memory location M and for each element it requires w number of words then the address for A[ i ] will be: Address ( A[ i ]) = M + ( i – L ) * w Here Lower bound of array can be any arbitrary value denoted by L. M i L w Base address of array Subscript of an element whose address is required to be calculated Lower bound of array Size of an element Above formula is known as indexing formula, which is used to map the logical presentation of an array to physical presentation. Page 4 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I By knowing the starting address of array M, the location of the i th element can be calculated instead of moving towards i from M, Figure-4. Figure-4 Address mapping between logical and physical view Example Suppose that each element requires 2 word (byte), the base address of the array a[10] is 100 and the lower bound of the array is 1. Find out the address of following elements: 1. Find address of a[4]. 2. Find address of a[7]. Solution We are given that: Base address of the array is 100. So M=100 Lower bound of the array is 1. So L=1. Each element requires 2 word (byte). So w=2. Formula: Address ( A[ i ] ) = M + ( i – L ) * w 1. To find address of a[4] Address ( A[4] ) = 100 + (4 – 1) * 2 = 106 2. To find address of a[7] Address ( A[7] ) = 100 + (7 – 1) * 2 = 112 METHOD - 2 If array is stored starting from memory location L0 and for each element it requires c number of words then the location of Ai will be: LOC( Ai ) = L0 + ( i – b ) * c Here Lower bound of array can be any arbitrary value denoted by b. L0 i b c Base address of array Subscript of an element whose address is required to be calculated Lower bound of array Size of an element This address is known as relative address with real values. The distance from the first element to any particular element is known as displacement (offset) where the value of L0 is not known. Page 5 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I **TWO DIMENSIONAL ARRAY Definition: Two dimensional arrays are the collection of homogeneous elements where elements are ordered in a number of rows and columns. Two dimensional arrays are also known as matrices. An m X n matrix where m denotes the number of rows and n denotes number of columns is as follows: mxn Figure-5 The subscripts of any arbitrary element say Aij represent ith row and jth column. Memory representation of a matrix Like one dimensional array, matrices are also stored in continuous memory location. There are two conventions of storing any matrix in memory: 1. Row major order 2. Column major order In row major order, elements of matrix are stored on a row by row basis. i.e. all the elements of the first row, then all the elements of second row and so on. In column major order, elements of matrix are stored column by column. i.e. all the elements of the first column are stored in their order of rows, then in second column and so on. Ex. Consider a matrix A of 3 X 4 Matrix of Figure-6 can be represented as shown in the Figure-7. Figure-6 Figure-7 Reference of elements in a matrix Logically a matrix appears as two dimensional but physically it is stored in a linear fashion. So in order to map from logical view to physical structure, we need indexing formula. Obviously, the indexing formula for different order will be different. Page 6 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I *ROW MAJOR ORDER – ADDRESS CALCULATION Assume that the base address is 1 (the first location of the memory). So the address of Aij will be obtained as “Storing all the elements in first ( i -1 )th rows + Number of elements in the ith row upto the jth column” i.e. Address ( Aij ) = ( i – 1 ) * n + j So for the matrix A3X4, the location of A32 will be calculated as 10 (see Figure-7). TO FIND ADDRESS OF AN ELEMENT IN TWO DIMENSIONAL ARRAY WITH LOWER BOUND = 1 and UPPER BOUND = 1 Note: This formula assumes that lower bound for i (row) and j (column) will be 1. Address ( Aij ) = M + ( ( i – 1 ) * n + j – 1 ) * w where n is total number of columns M i n j w Base address of array Row subscript of an element whose address is required to be calculated. Number of columns Column subscript of an element whose address is required to be calculated. Size of an element OR LOC( Aij ) = L0 + [ ( i – 1 ) * n + ( j –1 ) ] * c Where n is total number of columns. L0 i n j c Base address of array Row subscript of an element whose address is required to be calculated. Number of columns Column subscript of an element whose address is required to be calculated. Size of an element TO FIND ADDRESS OF AN ELEMENT IN TWO DIMENSIONAL ARRAY FOR ANY VALUE OF LOWER BOUND and UPPER BOUND) METHOD – 1 Address ( Aij ) = M + ( ( i – L1 ) * n + j – L2 ) * w where n is no. of columns M i L1 n j L2 w Base address of array Row subscript of an element whose address is required to be calculated. Lower bound of i (row) Number of columns Column subscript of an element whose address is required to be calculated. Lower bound of j (column). Size of an element Page 7 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I METHOD – 2 LOC( Aij ) = L0 + [ ( i – b1 ) * ( u2 – b2 + 1 ) + ( j – b2 ) ] * c L0 i b1 u2 b2 j c Base address of array Row subscript of an element whose address is required to be calculated. Lower bound of i (row) Upper bound of j (column) Lower bound of j (column) Column subscript of an element whose address is required to be calculated. Size of an element *COLUMN MAJOR ORDER – ADDRESS CALCULATION Assume that the base address is 1 (the first location of the memory). So the address of Aij will be obtained as “Storing all the elements in first ( j - 1)th columns + Number of elements in the jth column upto the ith row” i.e. Address ( Aij ) = ( j – 1 ) * m + i So for the matrix A3X4, the location of A32 will be calculated as 6 (see Figure-7). TO FIND ADDRESS OF AN ELEMENT IN TWO DIMENSIONAL ARRAY WITH LOWER BOUND = 1 and UPPER BOUND = 1 Note: This formula assumes that lower bound for i (row) and j (column) will be 1. Address ( Aij ) = M + ( ( j – 1 ) * m + i – 1 ) * w where m is total number of rows M j m i w Base address of array Column subscript of an element whose address is required to be calculated. Number of rows Row subscript of an element whose address is required to be calculated. Size of an element OR LOC( Aij ) = L0 + [ ( j – 1 ) * m + ( i –1 ) ] * c Where m is total number of rows. L0 j m i c Base address of array Column subscript of an element whose address is required to be calculated. Number of rows Row subscript of an element whose address is required to be calculated. Size of an element Page 8 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I TO FIND ADDRESS OF AN ELEMENT IN TWO DIMENSIONAL ARRAY FOR ANY VALUE OF LOWER BOUND and UPPER BOUND) METHOD – 1 Address ( Aij ) = M + ( ( j – L2 ) * m + i – L1 ) * w where m is no. of rows M j L2 m i L1 w Base address of array Column subscript of an element whose address is required to be calculated. Lower bound of j (column). Number of rows Row subscript of an element whose address is required to be calculated. Lower bound of i (row) Size of an element METHOD – 2 LOC( Aij ) = L0 + [ ( j – b2) * ( u1 – b1 + 1 ) + ( i – b1) ] * c L0 j b2 u1 b1 i c Base address of array Column subscript of an element whose address is required to be calculated. Lower bound of j (column) Upper bound of i (row) Lower bound of i (row) Row subscript of an element whose address is required to be calculated. Size of an element EXAMPLE Assume that the base address of the two dimensional array A[3][3] is 100, each element requires 2 byte (word). Find the address of the element A[3][2] using 1. Row major order 2. Column major order Solution We are given that: Base address of the array is 100. So M=100 Each element requires 2 word (byte). So w=2. Number of rows are 3. So m=3. Number of columns are 3. So n=3. Lower bound of i (row) L1 = 1 Lower bound of j (column) L2 = 1 Page 9 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I 1. Row major To find address of the A[3][2] element of the array. So i = 3 and j = 2 Address ( A[3][2] ) = M + ( ( i – L1 ) * n + j – L2 ) * w = 100 + ( ( 3 – 1 ) * 3 + 2 – 1 ) * 2 = 114 2. Column major To find address of the A[3][2] element of the array. So i = 3 and j = 2 Address ( A[3][2] ) = M + ( ( j – L2 ) * m + i – L1 ) * w = 100 + ( ( 2 – 1 ) * 3 + 3 – 1 ) * 2 = 110 SPARSE MATRICES A useful application of linear list in the representation of matrices that contain a preponderance (majority / large number) of 0 elements. These matrices are called sparse matrices. i.e. in such types of matrices the total number of zero elements is higher than the total number of none zero elements. They are commonly used in scientific application and contain 100 or even 1000 of rows and columns. The presentation of such large matrices is wasteful of storage and operations with these matrices are inefficient if the sequential allocation methods are used for their storage. Of the 42 elements in this 6 X 7 matrix, only 10 are non zero. They are: A[1,3] = 6, A[1,5] = 9, a[2,1] = 2, A[2,4] = 7, A[2,5] = 8, a[2,7] = 4, A[3,1] = 10, A[4,3] = 12, A[6,4] = 3, A[6,7] = 5 Figure-8 One of the basic methods for storing such a sparse matrices is to store non zero elements in a one dimensional array and to identify each array element with row and column indices, as shown in Figure-9 (a). Figure-9 (a) Figure-9(b) Sequential representation of Sparse Matrices Page 10 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I The ith element of vector A is the matrix element with row and column indices Row(i) and Column(i). A more efficient representation in terms of storage requirements and access time to the rows of the matrix is shown in Figure-9 (b). For large matrices the conservation of storage is very significant. The sequential allocation scheme for sparse matrices is also of value in that matrix operations can be executed faster than possible with a conventional two-dimensional array representation, particularly when matrices are large. APPLICATION OF ARRAYS 1. Arrays are used to implement mathematical vectors and matrices, as well as other kinds of rectangular tables. Many databases, small and large, consist of (or include) one-dimensional arrays whose elements are records. 2. Arrays are used to implement other data structures, such as heaps, hash tables, deques, queues, stacks, strings, and VLists. 3. One or more large arrays are sometimes used to emulate in-program dynamic memory allocation, particularly memory pool allocation. Historically, this has sometimes been the only way to allocate "dynamic memory" portably. 4. Arrays can be used to determine partial or complete control flow in programs, as a compact alternative to (otherwise repetitive), multiple IF statements. They are known in this context as control tables and are used in conjunction with a purpose built interpreter whose control flow is altered according to values contained in the array. The array may contain subroutinepointers (or relative subroutine numbers that can be acted upon by SWITCH statements) - that direct the path of the execution. 5. There are wide applications of arrays in computation. That is why almost every programming language includes this data type as a built in data type. 6. Suppose you want to store records of all students in a class. The record structure is given by STUDENT Roll No. Alpha Numeric Mark1 Numeric Mark2 Mark3 Numeric Numeric Total Grade Numeric Character Figure-10 If sequential storage of record is not objection then we can store the records by maintaining 6 array whose size is specified by the total number of students in the class as Figure-10. 7. Arrays can be used to represent polynomials so that mathematical operations can be performed in an efficient manner. Page 11 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I Arrays used to store Polynomials A two-dimensional array can be used to represent a polynomial in two variables. In a programming language which allows subscripts to start with zero (like C), the coefficient of the term xiyj would be stored in the element identified by row I and column j of the array. If we restrict the size of an array to maximum 5 rows and 5 columns then the powers of x and y must not exceed a value of 4 in any term of the polynomial. The array representing polynomial 2x2 + 5xy + y2 is given as 0 0 2 0 0 0 5 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 LIMITATIONS OF LINEAR ARRAYS 1. Number of elements in array is necessary to known in advance. 2. Arrays are static structures. i.e. memory is allocated at compilation time. So array’s memory can not be reduced or expanded. 3. An insertion / Deletion of an element from array is time consuming and require large amount of data movement. INTRODUCTION TO TREES Arrays, Stacks, Queues and Linked lists are known as linear data structures because elements are arranged in a linear fashion. Another very useful data structure is tree, where elements appear in a non linear fashion. Tree: A tree is a non linear data structure in which items are arranged in a sorted sequence. It is used to represent hierarchical relationship existing amongst several data items. A tree is a finite set of one or more nodes such that 1. There is a specially designated node called the root, 2. Remaining nodes are partitioned into n (n >0) disjoint sets T1, T2, T3,…….,Tn where each Ti (i=1,2,…….,n) is a tree; T1, T2, T3,…….,Tn are called sub trees of the root. Figure-11 Page 12 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I BASIC TERMINOLOGIES 1. Root It is specially designed data item in a tree. It is the first in the hierarchical arrangement of data items. OR This is a specially designated node which has no parent. In above tree, A is root. 2. Node Each data item in a tree is called node. It is the basic structure in a tree. It stores the actual data and links to another node. Figure-12 (a) represents the structure of the node. Figure-12(a) 3. Degree of Node It is the number of sub trees of a node in a given tree. 4. Parent Parent of the node is the immediate predecessor of a node. Here, X is the parent of Y and Z as shown in Figure-12(b) Figure-12(b) 5. Leaf or Terminal node The node which is at the end and which does not have any child is called leaf node. In figure 12(c) – H, I, K, L and M are the leaf nodes. Leaf node is also known as terminal node which contains degree zero. Page 13 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I Figure-12(c) 6. Level Level is a rank of the hierarchy. The entire tree structure is leveled in such a way that root node is always at level 0. Then its immediate children are at level 1 and their immediate children are at level 2 and so on up to the terminal nodes. If a node is at level l (el) then its child is at l +1 and parent is at level l – 1. This is true for all nodes except the root node. 7. Siblings The nodes which have the same parent are called siblings. They are also called brothers. In above tree: E and F are siblings of parent node B. 8. Height The maximum number of nodes that is possible in a path starting from root node to a leaf node is called height of a tree. It is the maximum level of any node in a given tree. The term height is also used to denote the depth. 9. Branch Branch means a link between parent and its child. 10. Edge It is connecting line of two nodes. The line drawn from one node to another node is called and edge. 11. Path It is a sequence of consecutive edges from the source node to destination node. In above tree, the path between A and J is given by the node pairs, (A,B), (B,F) and (F,J). Page 14 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I 12. Forest It is a set of disjoint trees. In a given tree, if you remove its root node then it becomes a forest. In above tree, there is forest with three trees. 13. Directed Tree It is an acyclic diagraph which has one node called its root whose indegree is 0 while other nodes have indegree 1. Applications of tree data structure 1. Trees can be used to store information that naturally forms a hierarchy. For example, the file system on a computer: File system ———– college / \ bba bca / \ course class / | \ fy sy ty 2. Figure-13 Trees can hold objects that are sorted by their keys. The nodes are ordered so that all keys in a node’s left sub tree are less than the key of the object at the node, and all keys in a node’s right sub tree are greater than the key of the object at the node. Here is an example of a tree of records, where each record is stored with its integer key in a tree node: Figure-14 3. Trees can hold objects that are located by keys that are sequences. For example, we might have some books with following catalog numbers: QA76 QA7 Q17 B1 Z4 book1 book2 book3 book4 book5 Figure-15 The books’s keys are sequences and the sequences label the branches of a tree that hold the books: Page 15 of 18 US03CBCA03 (Advanced Data & File Structure) * | +---------------+---------------+ B| Q| Z| * * * 1| 1/ \A 4| book4 * * book5 7| |7 book3 book2 |6 book1 Unit - I Figure-16 Books can be stored at nodes or leaves, and not all nodes hold a book (e.g Q1). This tree is called a spelling tree and it has the advantage that the insertion and retrieval time of an object is related only to the length of the key. 4. Tree can represent a structured object, such as a house that must be explored by a robot or human player in an adventure game: house's entrance ---- upper hallway ---- bedroom --- closet ---... | | | | | +-----private bath---... | +---study---... | lower hallway--- kitchen ---... | + --- lounge ---... Figure-17 5. Trees are used to represent phrase structure of sentences, which is crucial to language processing programs. Here is the phrase structure tree (parse tree) for the java statements: int x; x = 3 + y; STATEMENT SEQUENCE / \ DECLARATION ASSIGNMENT / \ / \ TYPE VARIABLE VARIABLE EXPRESSION | | | / | \ int x x NUMERAL + VARIABLE | | 3 y Figure-18 6. An operating system maintains a disk’s file system as a tree, where file folders act as tree nodes. Figure-19 Page 16 of 18 US03CBCA03 (Advanced Data & File Structure) Unit - I 7. If we organize keys in form of a tree (with some ordering e.g., BST i.e. Binary Search Tree), we can search for a given key in moderate time (quicker than Linked List and slower than arrays). 8. We can insert / delete keys in moderate time (quicker than Arrays and slower than Unordered Linked Lists). 9. Like Linked Lists and unlike Arrays, Pointer implementation of trees doesn’t have an upper limit on number of nodes as nodes are linked using pointers. 10. As per Wikipedia, following are the common uses of tree. 1. Manipulate hierarchical data. 2. Make information easy to search (see tree traversal). 3. Manipulate sorted lists of data. 4. As a workflow for compositing digital images for visual effects. 5. Router algorithms Exercise 1. Assume that 4 byte of storage are required to hold each element of a one dimensional array A[10]. Further assume that the storage for the array begins at 501 in memory. Give the actual address of the elements A[3] and A[9]. 2. Assume that 2 byte of storage are required to hold each element of a one dimensional array A[10]. Further assume that the storage for the array begins at 101 in memory. Subscript limits of an array is -5 ≤ i ≤ 4. Calculate the actual address of the elements A[-3], A[-1] and A[4]. 3. Suppose an array A[-15 … 64] is stored in a memory whose starting address is 459. Assume that the word size for each element is 2. What is the location for A[50]? Assume that 8 byte of storage are required to hold each element of a two dimensional array A[20][20]. Further assume that the storage for the array begins at 1000 in memory. Give the actual address of the elements A[6][4] and A[15][9] when i. The array is stored in Row major ii. The array is stored in Column major 4. 5. Find address of an element a[1,3] and a[4,5] where i. Storage representation of array a is row major ii. Subscript limits of an array : 0 ≤ i ≤ 5, 1 ≤ j ≤ 6 iii. Storage for array begins at 100 bytes in memory iv. 2 bytes of storage is required to hold each element of array 6. A two dimensional array defines as a[4…7, -1…3] requires 2 bytes of storage space per each element. If the array is stored in row major form then calculate the address of element at location a[6,2]. Base address is 100. 7. Each element of an array a[-20…20, 10…35] requires 1 byte of storage. If the array is column major implemented, and the beginning of the array is at location 500 (base address) then determine the address of element a[0,30]. Page 17 of 18 US03CBCA03 (Advanced Data & File Structure) 8. Unit - I Give Sequential and efficient representation for following Sparse matrix B. 0 9 0 0 20 0 5 0 0 0 6 0 0 7 0 0 0 7 0 0 0 4 0 0 8 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 13 1 0 0 0 2 0 0 0 3 0 0 0 0 0 Disclaimer The study material is compiled by Ami D. Trivedi. The basic objective of this material is to supplement teaching and discussion in the classroom in the subject. Students are required to go for extra reading in the subject through library work. Page 18 of 18