Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Programming - Planning & Design of Solution (Data Type & Data Structure) Contents INTRODUCTION ................................................................................................................................................ 2 DATA TYPES ....................................................................................................................................................... 2 DATA STRUCTURES AND ABSTRACT DATA TYPES ............................................................................... 2 LIST .................................................................................................................................................................... 3 LINKED LIST ....................................................................................................................................................... 3 LINEAR LINKED LISTS .................................................................................................................................... 5 QUEUE .................................................................................................................................................................. 7 STACK .................................................................................................................................................................. 8 BINARY SEARCH TREE ................................................................................................................................. 12 1 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Introduction A program acts on data. Such data may be either previously stored in some external memory or read in as from some input devices. Besides, programs often use temporary data, e.g. counter for counting the number of iterations performed in a repetition structure, to support certain computations. Since all data items are represented in binary digits, programs need to be able to interpret each data item correctly. This is usually done by associating a data type with each data item. Data types can be simple, structured and user-defined. In strongly-typed programming languages like Pascal, type annotations are associated with variable names when they are declared. Data Types Virtually all programs deal with integers, decimal numbers, characters and text strings, etc. Most imperative programming languages allow programmers to associate such data items with their interpretation through data type declaration. Due to the simplicity of those data types, they are known as simple data types or primitive data types. For example, a variable that stores a student’s subject score in a test should be declared as an integer. However, it will be tedious to declare multiple variables to store subject scores in multiple tests. One solution is to use array, which is a kind of structured data type. With the use of index, individual array element (e.g. individual test score) can be accessed. In brief, a structured data type contains elements that are not atomic. Elements of a structured data type may or may not be homogeneous, i.e. of the same data types. In an array, all elements are all the same data type. However, a structured data type can be defined by elements of different data types. For example, a student health record may store student name, student number, date of birth, address and medical history, etc. In fact structured data types are often referred to as user defined data types as they are defined to meet the user needs as in the student health record example. Data Structures and Abstract Data Types In order to enable a program to execute efficiently, data are often organized in some data structures so that a more efficient algorithm can be used to access those data. Specific data structures can be designed to meet individual program’s needs. However there exist some widely used data structures such as array and linked list for implementing abstract data type like lists, stacks and queues. An abstract data type is a specification of a set of data and the set of operations that can be performed on the data. 2 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Teaching remark Although lists, linked list, stacks, and queues are mentioned together in the ALCS Curriculum and Assessment Guide, lists, stacks and queues are usually considered instances of abstract data type whereas linked lists as instances of data structure in the literature. List A list is defined as a collection of ordered entities. A list can be empty. A non-empty list is considered to be made of its first element, also called head, and the remaining sub-list, known as tail. Note that the head of a list is NOT a list but the tail of a list is a list. Some basic list operations are given below. 1. Create a list (which is empty) 2. Add a new element as the head of a list 3. 4. Get the head of a list Testing whether a list is empty Suppose (“Beatrice”, “Catherine”, “Dianna”) is a list, the head of the list is “Beatrice” whereas the tail of the list is (“Catherine”, “Dianna”). If “Aaron” is added to the list as the list head, the list becomes (“Aaron”, “Beatrice”, “Catherine”, “Dianna”). If a list contains only one element, its tail is an empty list. Random access over list elements may or may not be possible and this depends on the list implementation. By restricting which and how elements of a list can be accessed, more specific abstract data types are derived. For example, queue and stack are also known as first-in-first-out (FIFO) list and last-in-first-out list (LIFO) respectively. More details on queue and stack will be given shortly. A list is often implemented with a linked list although it is also possible to implement specific versions of lists in arrays. Linked List Wikipedia gives a succinct description of what linked lists are: a linked list is one of the fundamental data structures used in computer programming. It consists of a sequence of nodes, each containing arbitrary data fields and one or two references ("links") pointing to the next and/or previous nodes. A linked list is a self-referential data type because it contains a pointer or link to 3 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 another data of the same type. Linked lists permit insertion and removal of nodes at any point in the list in constant time, but do not allow random access. Several different types of linked list exist: singly-linked lists, doubly-linked lists, and circularly-linked lists. Teaching remark Our discussion of linked lists will be confined to linearly linked lists, i.e. singly-linked lists and doubly-linked lists, so as to comply with the ALCS curriculum. To establish a link from a list element to another, a linked list element usually stores the address reference of another list item in addition to the data item. This requires the implementation programming language to support the use of address pointer. If the use of address pointer is not allowed, an array of records can be used to implement a linked list. The following diagram shows a singly-linked list of three elements linked by address pointers. (Source: http://en.wikipedia.org/wiki/Linked_list#Linked_lists_using_arrays_of_nodes) The diagram shown above indicates that the list element that contains “37” is the last list element and its address pointer has a null value, which is used to indicate the end of a list. The diagram below shows a doubly-linked list with elements linked by forward address pointers and backward address pointers. (Source: http://en.wikipedia.org/wiki/Linked_list#Linked_lists_using_arrays_of_nodes) Readers may notice an important difference between the singly and doubly linked list examples – the omission of a head node in the singly linked list. Whether a head node is required is a program design decision. In general, the linked list implementation with a head node is preferred as it makes the addition of a list element as the list head easier. 4 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Linear linked lists Linear linked lists are used to store data items that are accessed in a linear order. For singly linked lists, access order is restricted to a forward direction. The address pointer of a list element points to the next list element. For doubly linked lists, they can be accessed from either a forward or a backward direction as two address pointers will be used – one points to the preceding list element whereas another points to the subsequent list element. The following linked list is introduced to illustrate the insertion of a list element. The contents of the pointers are set arbitrarily in the example. For simplicity, we omit the head node of the list. Data item 1 35 Data item 3 59 Data item 2 Data item 4 47 87 Data item 5 Null Suppose we want to add a new data item called “Data item X” to become the 3rd element of the list. The following steps are involved. 1. Create a new node (say, node X) in the memory to store Data item X. illustration purpose, we assume that node X is located at address 99. For 2. Store relevant information in the data field of node X. 3. Locate the 2nd list element from the beginning of the list. 4. Copy the pointer field of the 2nd list element (i.e. node that contains “Data item 2”) to the pointer field of node X so that the latter will point to the list element in the 3rd position of the original list (i.e. node that contains “Data item 3”). 5. Modify the content of the pointer field of the 2nd list element (i.e. node that contains “Data item 2”) to address 99 so as to point to node X. 5 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Steps 1, 2, 3 and 4 Data item 1 35 Data item X 59 Data item 2 Data item 4 59 Data item 3 47 Data item 5 Null 87 Step 5 Data item 1 35 Data item X Data item 2 Data item 4 99 59 Data item 3 47 Data item 5 Null 87 6 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Queue Queue is a first-in-first-out (FIFO) list. A new data item can only be added to the back of a queue while removal of data item from a queue is restricted to the queue front. This ensures that the data item stayed in a queue for the longest time will be removed next. The notion of queue can be found in many real life scenarios, e.g. queuing for a bus, queuing for buying a cinema ticket and queuing for payment at a supermarket’s cashier counter, etc. Some basic list operations are given below. 1. Create a queue (which is empty) 2. 3. 4. Put a new element to the rear end of a queue Get an element from the front end of a queue Testing whether a queue is empty Suppose there is an empty queue and the following instructions are executed on that queue. 1. 2. 3. 4. put (10) put (3) put (6) put (5) 5. t = get() 6. put (9) 7. i = get() The queue content upon execution of each of the above instructions is shown below. Step 1 : put (10) Queue 10 Step 2 : put (3) Queue 10 3 10 3 Step 3 : put (6) Queue 6 7 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Step 4 : put (5) Queue 10 3 6 3 6 5 3 6 5 5 9 5 Step 5 : t = get() Queue t = 10 Step 6 : put (9) Queue 9 Step 7 : i = get() Queue 6 i=3 Stack Stack is a last-in-first-out (LIFO) list. Addition and removal of data item to/from a stack can only be done at the top of a stack. This ensures that the data item stayed in a stack for the shortest time will be removed next. Examples of stack in real life are a stack of plates and a pile of documents. In programming, stack is used to store program status when program execution is to be branched to a separate subprogram. The program status is restored when control is returned to the program when the subprogram finishes its execution. Some basic stack operations are given below. 1. 2. 3. 4. Create a stack (which is empty) Push a new element to a stack Pop an element from a stack Testing whether a stack is empty Suppose there is an empty stack and the following instructions are executed on that stack. 1. push (10) 8 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 2. push (3) 3. 4. 5. 6. 7. push (6) push (5) t = pop() push (9) i = pop() For simplicity, we assume that the stack can store up to six data items. upon execution of each of the above instructions is shown below. Step 1 : push (10) push(10) 10 Step 2 : push (3) push(3) 3 10 Step 3 : push (6) 9 The stack content Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 push(6) 6 3 10 Step 4 : push (5) push(5) 5 6 3 10 Step 5 : t = pop() t=5 pop() 6 3 10 Step 6 : push (9) 10 21/2/2007 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 push(9) 9 6 3 10 Step 7 : i = pop() i=9 pop() 6 3 10 11 21/2/2007 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Binary Search Tree A binary search tree is a data structure for storing ordered data items in a way that data retrieval can be done much more efficient than linear ordered structures such as list. Wikipedia describes the properties of binary search trees as follows: Each node has a value. A total order is defined on these values. The left subtree of a node contains only values less than the node’s value. The right subtree of a node contains only values greater than or equal to the node’s value. In a binary tree, a node can be a parent of zero, one or two nodes. the tree has no parent. A node without child is called a leaf node. All but the root node in Teaching remark Note that the notion of left and right subtrees is artificial. In reality, all nodes’ values are stored in a one-dimensional memory space. Having said that, such a notion eases the explanation of the key advantage of binary search trees. Binary search tree allows three basic operations: 1. Search the tree for a node 2. Insert a node 3. Delete a node Example 1 : Searching for a node Suppose we have an initial binary search tree as below. node that we want to locate. 12 Let us further assume that 27 is the Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 Step 1: The search starts from the root node, i.e. 52. Step 2: Since 27 is smaller than 52, the search proceeds at the left subtree. Step 3: Since 27 is greater than 23, the search proceeds at the right subtree. 13 21/2/2007 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Step 4: The desirable node is found and the search stops. If the node to be located cannot be found after a leaf node is reached, that means the node does not exist in the search tree. The height of a binary tree is defined as the length of the path from the root node to its furthest leaf. Obviously, the number of data comparisons to be done in order to search a particular data item in a binary search tree of height h is no more than (hS). A binary search is degenerated into a list if the tree is skewed entirely to the left hand side or the right hand side. Only in such cases will the tree size (say n) be equal to its height (say h). That means only in the worst case that a binary search tree will require the same number of data comparisons to locate an item as in a linear list search. is superior to a linear search on average. In order words, a binary search tree Example 2: Inserting a node By using the previous binary search tree again. This time, we want to add a 58-node to the binary tree. The task is to get to the point of insertion. Step 1: The search starts from the root node, i.e. 52. Step 2: Since 58 is greater than 52, the search proceeds at the right subtree. 14 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Step 3: Since 58 is smaller than 70, the search proceeds at the left subtree. Step 4: Since 58 is smaller than 64 and the 64-node is a leaf node,, the 58-node is added as the left-hand child of the 64-node.. 15 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Teaching remark The above simple insertion procedure can results in a skewed tree. For example, if 57, 56, 55, 54 and 53 are subsequently added to the tree, all the new nodes will be organized in a list form within the tree. Besides, the simple insertion procedure produces a different tree when the insertion order is changed. An idea called tree balancing can alleviate the problems but the topic is out of the scope of ALCS curriculum. Example 3: Deleting a node Deleting a node from a binary search tree is more complicated and Wikipedia outline the cases to be considered as follows: Deleting a leaf: Deleting a node with no children is easy, as we can simply remove it from the tree. Deleting a node with one child: Delete it and replace it with its child. Deleting a node with two children: Suppose the node to be deleted is called N. We replace node N with either its in-order successor (the left-most child of the right subtree) or the in-order predecessor (the right-most child of the left subtree). Example : Delete the 7-node In the following example, we illustrate the deletion of a node (the 70-node) with two children from the binary search tree below. In the example, we will replace the node by its in-order predecessor i.e. the 64-node(the right-most child of the left subtree). 16 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 21/2/2007 Step 1: With the use of the search function, locate the node (i.e. 70-node) to be removed. If the node cannot be found, stop processing. Step 2: Replace the 70-node by its in-order predecessor (the right-most child of the left subtree), which is the 64-node. Step 3: As the right-most child of the left subtree of the node for deletion (i.e. the 70-node), the 64-node has no right subtree. The right subtree of the deleted node is made as the right subtree of the replacement node (i.e. the 64-node). 17 Programming Planning and Design of Solution (Data Type & Data Structure) V0.1 or - End - 18 21/2/2007