Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Searching (and into Trees) Trees 1 Searching techniques Searching algorithms in unordered “lists” Searching algorithms in ordered “lists” “Lists” may be arrays or linked lists The algorithms are adaptable to either Primarily arrays with values sorted into order We can exploit the order to search more efficiently Let’s start with arrays The data held in the array might be Simple (e.g. numbers, strings) or more complex objects: the search will probably be based on a chosen key field in the objects (e.g., search student records by registration number) We will only consider simple data – the searching techniques are the same Trees 2 Sequential search in an unordered list In these algorithms we will assume: The basic technique to be used is “sequential search” The data is integers Held in an array variable numbers, Is in random order The number of data values is indicated by a variable size The data is in elements indexed 0 to (size-1) We are seeking the integer held in variable val Compare val with the value in numbers[0], then with that in numbers[1], etc We will look at two versions of an algorithm encoding this Other adaptations are possible Trees 3 Algorithm 1: Standard sequential search Here is a basic search algorithm. It leaves its result in a variable called position: int position = 0; while (position < size) { if (numbers[position] == val) break; // Exit loop if found position++; } If val is not present: The entire array will be scanned - taking size steps position will have a final value of size But if val is present: break; -> the while loop terminates immediately The average number of scanning steps expected is size/2 Easy to adapt to return a boolean, or throw an exception Trees 4 If we are careful, we can combine the loop test and the array element check: int position = 0; while (numbers[position] != val && position < size) position++; Is this correct? int position = 0; while (numbers[position++] != val && position < size); Trees 5 Corrected Version If we are careful, we can combine the loop test and the array element check: int position = 0; while (position < size && numbers[position] != val) position++; 1. 2. 3. 4. The && test checks position < size first, and if it is false does not check numbers[position] != val otherwise would get ArrayIndexOutOfBoundsException if val is not present! This is called "conditional" or "short-circuit" behaviour: it applies to && and || Trees 6 Algorithm 2: Sequential search with a “sentinel” We can improve the basic search algorithm if the array numbers has one extra element, numbers[size], that is never used for actual data Instead we place a copy of the sought value there, so the search always succeeds. This means that the loop does not need to carry out the “end of array” test - less work, so quicker. int[] numbers = new int[size+1]; ... int position = 0; numbers[size] = -1; // Insert "sentinel" while (numbers[position] != val) position++; return position; As before, position has the final value size if val is not present Trees 7 We may be interested in an algorithm’s best case, worst case or average execution time: For the sequential search algorithm (with or without sentinel): Best case is 1 step: O(1) Worst case is N steps: O(N) The actual average number of steps depends on ratio of successful/unsuccessful searches: The average of successful searches is N/2 steps, and so is O(N) All unsuccessful searches take N steps, which is O(N), So overall the average complexity is O(N) Trees 8 Searching an Ordered List Again we will assume: The data is integers, held in an array sequence, So the data is in elements indexed 0 to (length-1) But this time we assume that the values are held in ascending numerical order We are seeking the integer held in val We could use the sequential search algorithm, but this does not take advantage of the knowledge that the data is ordered. (The complexity remains O(N). ) Instead, we will take advantage of the ordering to improve search efficiency (i.e., to reduce the complexity) Trees 9 Binary Search If the data is already ordered, we can do much better than a linear time algorithm. Here is the scheme: Pick the middle element in the array If it is equal to val, stop the search If it is greater than val, search the lower half of the remaining array If it is less than val, search the remaining upper half At each iteration: We are searching in a remaining partition of the array We cut the remaining partition in half, rather than just removing one element Example: Searching for 11 in 1, 3, 5, 7, 9, 11, 13 First compare with 7, so search in 9, 11, 13 Now compare with 11 - found it - in two steps Trees 10 Concretely: Let variable low indicate the lowest element of the partition (index 0 initially) high (h) indicate the highest element (size-1 initially) middle (m) indicate the next element being tested The search for 11 proceeds like this: low=0 h=6 m=3 0 1 2 3 4 1 3 5 7 9 low Not found, and 11>7, so low=(m+1)=4 h=6 m=5 1 5 11 13 h m 3 5 7 6 9 11 13 low m h Found it, at index 5 Trees 11 An unsuccessful search: search for 10 0 low=0 h=6 m=3 Not found, and 10>7, so low=(m+1)=4 h=6 m=5 1 2 3 4 1 3 low 5 7 m 9 11 13 h 5 7 9 11 13 1 low m Not found, and 10<11, so 1 low=4 h=(m-1)=4 m=4 Not found, and 10>9, so low=(m+1)=5 h=4 3 5 1 3 5 7 9 6 h 11 13 low h m 3 5 7 9 11 13 h low Now low>h, and the partition has Trees “vanished”: the search has failed 12 Algorithm binarySearch: INPUT: val – value of interest, sequence – sorted data OUTPUT: object or value of interest if exists, null otherwise int low = 0, middle = 0, high = seq.length; while (high >= low) { middle = (high + low) / 2; if (sequence[middle] == val) return sequence[middle]; // Found it else if (sequence[middle] < val) low = middle + 1; // Search upper half else high = middle - 1; // Search lower half } return -1; // or null if an object-type The outcomes: Ordinary loop exit when the indexes “cross” not found (i.e. high < low) Loop exit on return found (detect this by testing high >= low) Trees 13 The Complexity of Binary Search Best case: val is exactly sequence[middle] at the first step The search stops after first step, so complexity O(1) Worst case: This will be when we continue dividing until the “partition” contains only one value: then it is either equal to val or not For 250 elements this turns out to be about 8 iterations For 500 it is about 9 For 1000 it is about 10 Double the amount of data Add one step! In general: the size is approximately 2steps So the number of steps is approximately log2 size Complexity is O(logN) For emphasis: double the amount of data Add one step! Average case: Don’t need to consider this: the worst case is very good! Trees N log2N 1 0 2 1 4 2 8 3 16 4 32 5 64 6 128 7 256 8 512 9 1024 10 2048 11 4096 12 8192 13 16384 14 32768 15 65536 16 131072 17 262144 18 524288 19 1048576 20 14 Trees Make Money Fast! Stock Fraud Ponzi Scheme Trees Bank Robbery 15 What is a Tree In computer science, a tree is an abstract model of a hierarchical structure A tree consists of nodes with a parent-child relation US Applications: Computers”R”Us Sales Manufacturing International Organization charts File systems Europe Programming environments Trees Asia Laptops R&D Desktops Canada 16 Tree Terminology Root: node without parent (A) Subtree: tree consisting of a node and its Internal node: node with at least descendants one child (A, B, C, F) External node (a.k.a. leaf ): node A without children (E, I, J, K, G, H, D) Ancestors of a node: parent, grandparent, grand-grandparent, B C D etc. Depth of a node: number of ancestors E F G H Height of a tree: maximum depth of any node (3) Descendant of a node: child, I J K subtree grandchild, grand-grandchild, etc. Trees 17 Tree ADT We use positions to abstract nodes Generic methods: integer size() boolean isEmpty() Iterator iterator() Iterable positions() boolean isInternal(p) boolean isExternal(p) boolean isRoot(p) Update method: element replace (p, o) Additional update methods may be defined by data structures implementing the Tree ADT Accessor methods: Query methods: position root() position parent(p) Iterable children(p) Trees 18 Preorder Traversal A traversal visits the nodes of a tree in a systematic manner In a preorder traversal, a node is visited before its descendants Application: print a structured document 1 Algorithm preOrder(v) visit(v) for each child w of v preorder (w) Make Money Fast! 2 5 1. Motivations 9 2. Methods 3 4 1.1 Greed 1.2 Avidity 6 7 2.1 Stock Fraud Trees 2.2 Ponzi Scheme References 8 2.3 Bank Robbery 19 Postorder Traversal In a postorder traversal, a node is visited after its descendants Application: compute space used by files in a directory and its subdirectories 9 Algorithm postOrder(v) for each child w of v postOrder (w) visit(v) cs16/ 3 8 7 homeworks/ todo.txt 1K programs/ 1 2 h1c.doc 3K h1nc.doc 2K 4 5 DDR.java 10K Trees Stocks.java 25K 6 Robot.java 20K 20 Ordered Binary Trees A binary tree is a tree with the following properties: Each internal node has at most two children (exactly two for proper binary trees) The children of a node are an ordered pair a tree consisting of a single node, or a tree whose root has an ordered pair of children, each of which is a binary tree Trees arithmetic expressions decision processes searching A We call the children of an internal node left child and right child Alternative recursive definition: a binary tree is either Applications: B C D E H F G I 21 Arithmetic Expression Tree Binary tree associated with an arithmetic expression internal nodes: operators external nodes: operands Example: arithmetic expression tree for the expression (2 (a - 1) + (3 b)) + - 2 a 3 b 1 Trees 22 Decision Tree Binary tree associated with a decision process internal nodes: questions with yes/no answer external nodes: decisions Example: dining decision Want a fast meal? No Yes How about coffee? On expense account? Yes No Yes No Starbucks Spike’s Al Forno Café Paragon Trees 23 BinaryTree ADT The BinaryTree ADT extends the Tree ADT, i.e., it inherits all the methods of the Tree ADT Additional methods: position left(p) position right(p) boolean hasLeft(p) boolean hasRight(p)Trees Update methods may be defined by data structures implementing the BinaryTree ADT 25 Inorder Traversal In an inorder traversal a node is visited after its left subtree and before its right subtree Application: draw a binary tree Algorithm inOrder(v) if hasLeft (v) inOrder (left (v)) visit(v) if hasRight (v) inOrder (right (v)) x(v) = inorder rank of v y(v) = depth of v 6 2 8 1 4 3 7 9 5 Trees 26 Print Arithmetic Expressions Specialization of an inorder traversal print operand or operator when visiting node print “(“ before traversing left subtree print “)“ after traversing right subtree + - 2 a 3 Algorithm printExpression(v) if hasLeft (v) print(“(’’) inOrder (left(v)) print(v.element ()) if hasRight (v) inOrder (right(v)) print (“)’’) b ((2 (a - 1)) + (3 b)) 1 Trees 27 Evaluate Arithmetic Expressions Specialization of a postorder traversal recursive method returning the value of a subtree when visiting an internal node, combine the values of the subtrees + Algorithm evalExpr(v) if isExternal (v) return v.element () else x evalExpr(leftChild (v)) y evalExpr(rightChild (v)) operator stored at v return x y - 2 5 3 2 1 Trees 28 Linked Structure for Trees A node is represented by an object storing 1. 2. 3. Element Parent node Sequence of children nodes B A B D A C D F F E C Trees E 30 Linked Structure for Binary Trees A node is represented by an object storing 1. 2. 3. 4. Element Parent node Left child node Right child node B B A A D C D E C Trees E 31 Array-Based Representation of Binary Trees Nodes are stored in an array A 1 A 0 A B D 1 2 3 … G H 10 11 … 2 Node v is stored at A[rank(v)] 4 rank(root) = 1 E if node is the left child of parent(node), rank(node) = 2 rank(parent(node)) if node is the right child of parent(node), 10 rank(node) = 2 rank(parent(node)) + 1 Trees 3 B D 5 6 7 C F J 11 G H 32