Survey

Survey

Document related concepts

Transcript

Data Structure and Algorithm Question and Answer Solved(2006) Q: What is Data Structure?Difference between primitive and non primitive data structure. A: In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently. Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks. For example, B-trees are particularly well-suited for implementation of databases, while compiler implementations usually use hash tables to look up identifiers. Data structures are used in almost every program or software system. Specific data structures are essential ingredients of many efficient algorithms, and make possible the management of huge amounts of data, such as large databases and internet indexing services. Some formal design methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design. Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address — a bit string that can be itself stored in memory and manipulated by the program. Thus the record and array data structures are based on computing the addresses of data items with arithmetic operations; while the linked data structures are based on storing addresses of data items within the structure itself. Many data structures use both principles, sometimes combined in non-trivial ways (as in XOR linking). A primitive data structure are the basic data structures and are directly operated upon by the machine instructions. They cannot be futher disintegrated into smaller data items e.g. int,float,char etc. Nonprimitive data structures can be classified as arrays, lists, and files. An array is an ordered set which contains a fixed number of objects. No deletions or insertions are performed on arrays. At best, elements may be changed. A list, by contrast, is an ordered set consisting of a variable number of elements to which insertions and deletions can be made, and on which other operations can be performed. When a list displays the relationship of adjacency between elements, it is said to be linear; otherwise it is said to be nonlinear. A file is typically a large list that is stored in the external memory of a computer. Additionally, a file may be used as a repository for list items (records) that are accessed infrequently. Q: Explain the Different Operations Performed on data structures? A: The data appearing in our data structure is processed by means of certain operations. Infact, the particular data structure that one chooses for a given situation depends largely on the frequency with which specific operations are performed.The following four operations play a major role: Transversing Accessing each record exactly once so that certain items in the record may be processed.(This accessing or processing is sometimes called 'visiting" the records.) Searching Finding the location of the record with a given key value, or finding the locations of all records, which satisfy one or more conditions. Inserting Adding new records to the structure. Deleting Removing a record from the structure. Two additional operations are also performed in some situations such as Sorting Arranging the records in some logical orders eg: Alpabetical order or numerical order such as numbers. Merging Combining the records in two different sorted files into a single sorted file Sometimes two or more data structure of operations may be used in a given situation; e.g., we may want to delete the record with a given key, which may mean we first need to search for the location of the record. Q: What is an Array?Explain the address calculation in single and multidimensional Arrays. A: An array is data structure (type of memory layout) that stores a collection of individual values that are of the same data type. Arrays are useful because instead of having to separately store related information in different variables (named memory locations), you can store them—as a collection—in just one variable. It is more efficient for a program to access and process the information in an array, than it is to deal with many separate variables. All of the items placed into an array are automatically stored in adjacent memory locations. As the program retrieves the value of each item (or "element") of an array, it simply moves from one memory location to the very next—in a sequential manner. It doesn't have to keep jumping around to widely scattered memory locations in order to retrieve each item's value. Imagine if you had to store—and later retrieve—the names of all of the registered voters in your city. You could create and name hundreds of thousands of distinct variable names to store the information. That would scatter hundreds of thousands of names all over memory. An alternative is to simply create one variable that store the same information, but in sequential memory locations. For example, if you have class with five students, and you want to store their test grades, you will create an array of the integer data type. Since you have five students, you will create a single array. This sets aside five sequential memory locations to hold the five scores. Each score is stored as an "element" in the array. The first score will be store at location (or "index") zero. The second score will be stored at array index equal to one. The third score will be stored at index equals two, and so on. Let's name the array "Scores." The student grades are: 70, 75, 80, 85, 90, and 100. Let's store ( or "assign") the first grade: Scores (0) = 70. Now, the second grade: Scores (1) = 75. Assign the third grade: Scores (2) = 80, and so on. Now the computer code can access the array Scores to get the value of each score. Address Calculation in one Dimensional Array The address of a single dimension array can be easily calculated Consider an array A of 25 elements if we are required to find the address of A[4] elements If the first cell in the sequence A[1],A[2],…………A[25] was at address 15,the A[4] would be located at15+(4-1)=18,as shown in figure below .We assure the size of each element stored in one unit. 15 16 A[1] A[2] 17 A[3] 18 A[4] Memory Cells Therefore it is necessary to know the starting address of the space allocated to array and the size of each element which is same for all elements of an array.we may call the starting address as a base address and denoted it by B.Then the location of the I th element would be…ITH=B+(I-1)*s It must be noticed that it is for the case where lower bound on the subscript is one. Consider Our array A[4]…………………A[10].In this case our expression for the location of ith element would be B+(I-4)*s IN General, ITH=B+(i-l)*s Where B =Base addres. I=ith element. S=size of each element of array. L=Lower bound may be +ve ,-ve ,Zero. The address Calculation can also be Explained with the help of a Program given below: Suppose if num is the name of one dimensional array and if it is decleared as Int num[10] The address of the Ith element is calculated as(num+i)where I is the Ith element The address calculation of one dimensional array is illustrated in the following program /*Address Calculation for one dimensional arrays*/ #include<stdio.h> #include<conio.h> Void main() { int a[10],*p,I,n; p=&a[0]; clrscr(); printf(“Enter the total no. of elements:”); scanf(“%d”,&n); printf(“Enter the elements one by one:”); for(i=0;i<n;i++) scanf(“%d”,&a[i]); printf(“The given elements are \n”); for(i=0;i<n;i++) { Printf(“Address of %d element=%u/t its contents a [%d]=%d\n”,I,(a+i),I,*(a+i)); } Getch(); } Sample input and output Enter the total no of elements: 5 Enter the elements one by one 45 34 23 12 78 The given elements are Address of 0 element=65502 Its content a[0]=45 Address of 1 element=65504 Its content a[1]=34 Address of 2 element=65506 Its content a[2]=23 Address of 3 element=65508 Its content a[3]=12 Address of 4 element=65510 Its content a[4]=78 Formula of Address Calculation in two Dimensional Array(Row Major Form) B+(I-L1)*(U2-L2+1)*S+(J-L2)*S WHERE, B=BASE ADDRESS OF ARRAY S=SIZE OF THE ELEMENT OF THE ARRAY I=Location of ith element L1=Lower bound of row L2=Lower Bound of Column U1=Upper Bound Of Row U2 =Upper Bound Of Column J=Location of the jth element Formula of Address Calculation in MultiDimensional Array(Column Major Form) B+(I-L1)*(U2-L2+1)*S+(J-L2)*S WHERE, B=BASE ADDRESS OF ARRAY S=SIZE OF THE ELEMENT OF THE ARRAY I=Location of ith element L1=Lower bound of row L2=Lower Bound of Column U1=Upper Bound Of Row U2 =Upper Bound Of Column J=Location of the jth element Q:Write an algorithm to Insert,Delete,Transverse ,Sort and Search an element in an array? A: Representation of linear arrays in memory The elements of linear array are stored in successive memory cells. The number of memory cells required depends on the type of data elements. Computer keeps track of the address of the first element of the array. This is known as the base address of the array. Using this base address the computer can calculate the address of any element. Traversing linear arrays Let LB be the lower bound of the array and UB the upper bound of the array Initialize counter C = LB While C <= UB repeat steps 3 and 4 else go to step 5 Read the Cth element of the array Increment C by 1 (Set C = C + 1) [End of loop] Exit Inserting and deleting in arrays Inserting and deleting elements at the end of the array is easy. To insert anywhere else, we need to shift the elements occurring after that location down to create space for the new element. Similarly if we delete an element in anywhere in the array we need to move all the elements coming after that element up. Insert an element in an array Algorithm to insert an element Let LB be the lower bound of the array AR, UB the upper bound and P be the position where we want to add an element. Initialize counter C = UB While C >= P repeat steps 3 and 4 else go to step 6 Copy the value of the element stored in location C to location C+1 (AR[C+1] = AR[C]) Decrement C by 1 (Set C = C - 1) [End of loop] Set the value of element stored at P to the element to be added to the array Exit Delete an element from an array Algorithm to delete an element Let LB be the lower bound of the array AR, UB the upper bound and P be the position where we want to delete an element from. Initialize counter C = P While C <= UB repeat steps 3 and 4 else go to step 6 Copy the value of the element stored in location C+1 to location C (AR[C] = AR[C + 1]) Increment C by 1 (Set C = C + 1) [End of loop] Set the value of element stored at the UB to NULL (AR[UB] = NULL) Exit Sorting arrays Bubble Sort algorithm Bubble sort is a simple sorting algorithm. It works by repeatedly stepping through the list to be sorted, comparing each pair of adjacent items and swapping them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted. The algorithm gets its name from the way smaller elements "bubble" to the top of the list. Bubble sort has worst-case and average complexity both О(n²), where n is the number of items being sorted. Let LB be the lower bound of the array AR and UB the upper bound. Initialize counter I = LB Set value of counter J = I While I < (UB – 1) repeat steps 4 and 7 While J < (UB-1) repeat steps 5 and 6 Compare the value of element at Jth and (J+1)th locations. If value of element at Jth location > that at (J+1)th location then swap the elements Increment value of J by 1 [End of inner loop] Increment step I by 1 [End of outer loop] Exit Step-by-step example Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest number using bubble sort algorithm. In each step, elements written in bold are being compared. First Pass: ( 5 1 4 2 8 ) -> ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps them. ( 1 5 4 2 8 ) -> ( 1 4 5 2 8 ), Swap since 5 > 4 ( 1 4 5 2 8 ) -> ( 1 4 2 5 8 ), Swap since 5 > 2 ( 1 4 2 5 8 ) -> ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not swap them. Second Pass: ( 1 4 2 5 8 ) -> ( 1 4 2 5 8 ) ( 1 4 2 5 8 ) -> ( 1 2 4 5 8 ) ( 1 2 4 5 8 ) -> ( 1 2 4 5 8 ) ( 1 2 4 5 8 ) -> ( 1 2 4 5 8 ) Now, the array is already sorted, but our algorithm does not know if it is completed. The algorithm needs one whole pass without any swap to know it is sorted. Third Pass: ( 1 2 4 5 8 ) -> ( 1 2 4 5 8 ) ( 1 2 4 5 8 ) -> ( 1 2 4 5 8 ) ( 1 2 4 5 8 ) -> ( 1 2 4 5 8 ) ( 1 2 4 5 8 ) -> ( 1 2 4 5 8 ) Finally, the array is sorted, and the algorithm can terminate. Searching in arrays Linear Search Linear search algorithm is one of the simplest algorithms to search for an element. In linear search the array is traversed and the element is compared with each element of the array. In the worst case scenario that number that we are searching for could be at the end of the array and thus require n comparisons (where n is the length of the array) before the number is found or before we can confidently say that the number is not part of the array. Let LB be the lower bound of the array AR, UB the upper bound of the array and E the element to be found. Initialize counter C = LB While C <= UB repeat steps 3 and 4 else go to step 5 Compare the Cth element of the array with E. If the values match then display the number was found at the Cth location and go to step 5 Increment C by 1 (Set C = C + 1) [End of loop] Exit Binary Search Binary search is an example of divide and rule algorithm. In this we use the knowledge that the array is sorted to decrease the number of comparisons that we need to do before we find the number of can confidently say that the number does not exist in the array. In the worst case scenario binary search requires log 2n comparisons. Let LB be the lower bound of the array AR sorted in ascending order, UB the upper bound of the array and E the element to be found. Consider two variables BEG and END. BEG will stand for the beginning (lower bound) and END for the end (upper bound) for the current range of locations that we are working with in the program. Set BEG = LB and END = UB. While BEG is not equal to END repeat step 3 else go to step 4 Compare the number stored at the middle of the current range of locations ((BEG + END)/2) to E. If the values match then display the number was found at ((BEG + END)/2)th location and go to step 4 If number stored at the middle of the current range of locations ((BEG + END)/2) is greater than the element E then the number can be in the first half of the current range hence set END = ((BEG + END)/2) If number stored at the middle of the current range of locations ((BEG + END)/2) is less than the element E then the number can be in the second half of the current range hence set BEG = ((BEG + END)/2) [End of loop] Exit Q: what is the difference between sorting and searching?Explain the sorting techniques and their complexity analysis. Q: What is Hashing?Explain Three Techniques often built into hash functions. A: Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a databasebecause it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms. As a simple example of the using of hashing in databases, a group of people could be arranged in a database like this: Abernathy, Sara Epperdingle, Roscoe many more sorted into alphabetical order) Moore, Wilfred Smith, David (and Each of these names would be the key in the database for that person's data. A database search mechanism would first have to start looking character-by-character across the name for matches until it found the match (or ruled the other entries out). But if each of the names were hashed, it might be possible (depending on the number of names in the database) to generate a unique fourdigit key for each name. For example: 7864 Abernathy, Sara 9802 Epperdingle, Roscoe 1990 Moore, Wilfred David (and so forth) 8822 Smith, A search for any name would first consist of computing the hash value (using the same hash function used to store the item) and then comparing for a match using that value. It would, in general, be much faster to find a match across four digits, each having only 10 possibilities, than across an unpredictable value length where each character had 26 possibilities. The hashing algorithm is called the hash function (and probably the term is derived from the idea that the resulting hash value can be thought of as a "mixed up" version of the represented value). In addition to faster data retrieval, hashing is also used to encrypt and decrypt digital signatures (used to authenticate message senders and receivers). The digital signature is transformed with the hash function and then both the hashed value (known as a message-digest) and the signature are sent in separate transmissions to the receiver. Using the same hash function as the sender, the receiver derives a message-digest from the signature and compares it with the message-digest it also received. They should be the same. The hash function is used to index the original value or key and then used later each time the data associated with the value or key is to be retrieved. Thus, hashing is always a one-way operation. There's no need to "reverse engineer" the hash function by analyzing the hashed values. In fact, the ideal hash function can't be derived by such analysis. A good hash function also should not produce the same hash value from two different inputs. If it does, this is known as a collision. A hash function that offers an extremely low risk of collision may be considered acceptable. Here are some relatively simple hash functions that have been used: The division-remainder method: The size of the number of items in the table is estimated. That number is then used as a divisor into each original value or key to extract a quotient and a remainder. The remainder is the hashed value. (Since this method is liable to produce a number of collisions, any search mechanism would have to be able to recognize a collision and offer an alternate search mechanism.) Folding: This method divides the original value (digits in this case) into several parts, adds the parts together, and then uses the last four digits (or some other arbitrary number of digits that will work ) as the hashed value or key. Radix transformation: Where the value or key is digital, the number base (or radix) can be changed resulting in a different sequence of digits. (For example, a decimal numbered key could be transformed into a hexadecimal numbered key.) High-order digits could be discarded to fit a hash value of uniform length. Digit rearrangement: This is simply taking part of the original value or key such as digits in positions 3 through 6, reversing their order, and then using that sequence of digits as the hash value or key. A hash function that works well for database storage and retrieval might not work as for cryptographic or error-checking purposes. There are several well-known hash functions used in cryptography. These include the message-digest hash functions MD2, MD4, and MD5, used for hashing digital signatures into a shorter value called a message-digest, and the Secure Hash Algorithm (SHA), a standard algorithm, that makes a larger (60-bit) message digest and is similar to MD4.