Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
World Applied Programming, Vol (1), No (2), June 2011. 105-109 ISSN: 2222-2510 ©2011 WAP journal. www.waprogramming.com Searching algorithms C. Canaan * M. S. Garai M. Daya Information institute Chiredzi, Zimbabwe [email protected] Information institute Chiredzi, Zimbabwe [email protected] Information institute Chiredzi, Zimbabwe [email protected] Abstract: Here we want to represent an introduction about searching algorithms. Due to this, first we will discuss on general searching algorithm purposes. These are: virtual search spaces, sub-structures of a given structure and quantum computers. Also, for more information, we introduce some of simple and popular search algorithms such as: Linear search, Selection search and Binary search. The purpose of doing so is to make you familiar with implementation of searching algorithms. Key word: Search algorithms • linear search • selection search • binary search I. INTRODUCTION In computer science, a search algorithm, broadly speaking, is an algorithm for finding an item with specified properties among a collection of items. The items may be stored individually as records in a database; or may be elements of a search space defined by a mathematical formula or procedure, such as the roots of an equation with integer variables; or a combination of the two, such as the Hamiltonian circuits of a graph [1]. II. VIRTUAL SEARCH SPACES Algorithms for searching virtual spaces are used in constraint satisfaction problem, where the goal is to find a set of value assignments to certain variables that will satisfy specific mathematical equations and inequations. They are also used when the goal is to find a variable assignment that will maximize or minimize a certain function of those variables. Algorithms for these problems include the basic bruteforce search (also called "naïve" or "uninformed" search), and a variety of heuristics that try to exploit partial knowledge about structure of the space, such as linear relaxation, constraint generation, and constraint propagation. An important subclass are the local search methods, that view the elements of the search space as the vertices of a graph, with edges defined by a set of heuristics applicable to the case; and scan the space by moving from item to item along the edges, for example according to the steepest descent or best-first criterion, or in a stochastic search. This category includes a great variety of general metaheuristic methods, such as simulated annealing, tabu search, A-teams, and genetic programming, that combine arbitrary heuristics in specific ways. This class also includes various tree search algorithms, that view the elements as vertices of a tree, and traverse that tree in some special order. Examples of the latter include the exhaustive methods such as depth-first search and breadth-first search, as well as various heuristic-based search tree pruning methods such as backtracking and branch and bound. Unlike general metaheuristics, which at best work only in a probabilistic sense, many of these tree-search methods are guaranteed to find the exact or optimal solution, if given enough time. Another important sub-class consists of algorithms for exploring the game tree of multiple-player games, such as chess or backgammon, whose nodes consist of all possible game situations that could result from the current situation. The goal in these problems is to find the move that provides the best chance of a win, taking into account all possible moves of the opponent(s). Similar problems occur when 105 C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011. humans or machines have to make successive decisions whose outcomes are not entirely under one's control, such as in robot guidance or in marketing, financial or military strategy planning. This kind of problems has been extensively studied in the context of artificial intelligence. Examples of algorithms for this class are the minimax algorithm, alpha-beta pruning, and the A* algorithm [2]. III. SUB-STRUCTURES OF A GIVEN STRUCTURE The name combinatorial search is generally used for algorithms that look for a specific sub-structure of a given discrete structure, such as a graph, a string, a finite group, and so on. The term combinatorial optimization is typically used when the goal is to find a sub-structure with a maximum (or minimum) value of some parameter. (Since the sub-structure is usually represented in the computer by a set of integer variables with constraints, these problems can be viewed as special cases of constraint satisfaction or discrete optimization; but they are usually formulated and solved in a more abstract setting where the internal representation is not explicitly mentioned.) An important and extensively studied subclass are the graph algorithms, in particular graph traversal algorithms, for finding specific sub-structures in a given graph — such as subgraphs, paths, circuits, and so on. Examples include Dijkstra's algorithm, Kruskal's algorithm, the nearest neighbour algorithm, and Prim's algorithm. Another important subclass of this category is the string searching algorithms, which search for patterns within strings. Two famous examples are the Boyer–Moore and Knuth–Morris–Pratt algorithms, and several algorithms based on the suffix tree data structure. IV. QUANTUM COMPUTERS There are also search methods designed for (currently non-existent) quantum computers, like Grover's algorithm, that are theoretically faster than linear or brute-force search even without the help of data structures or heuristics. V. SIMPLE SEARCH ALGORITHMS In this section we are going to introduce some of the simple and popular searching algorithms including: Linear search, Selection search and Binary search. Linear search In computer science, linear search or sequential search is a method for finding a particular value in a list, that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found [2]. Linear search is the simplest search algorithm; it is a special case of brute-force search. Its worst case cost is proportional to the number of elements in the list; and so is its expected cost, if all list elements are equally likely to be searched for. Therefore, if the list has more than a few elements, other methods (such as binary search or hashing) will be faster, but they also impose additional requirements. For a list with n items, the best case is when the value is equal to the first element of the list, in which case only one comparison is needed. The worst case is when the value is not in the list (or occurs only once at the end of the list), in which case n comparisons are needed. If the value being sought occurs k times in the list, and all orderings of the list are equally likely, the expected number of comparisons is 106 C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011. For example, if the value being sought occurs once in the list, and all orderings of the list are equally likely, the expected number of comparisons is (n+1)/2. However, if it is known that it occurs once, than at most n - 1 comparison are needed, and the expected number of comparisons is (For example, for n = 2 this is 1, corresponding to a single if-then-else construct). Either way, asymptotically the worst-case cost and the expected cost of linear search are both O(n). The following pseudocode describes a typical variant of linear search, where the result of the search is supposed to be either the location of the list item where the desired value was found; or an invalid location Λ, to indicate that the desired element does not occur in the list. For each item in the list: if that item has the desired value, stop the search and return the item's location. Return Λ. In this pseudocode, the last line is executed only after all list items have been examined with none matching. If the list is stored as an array data structure, the location may be the index of the item found (usually between 1 and n, or 0 and n−1). In that case the invalid location Λ can be any index before the first element (such as 0 or −1, respectively) or after the last one (n+1 or n, respectively). If the list is a simply linked list, then the item's location is its reference, and Λ is usually the null pointer. Linear search can also be described as a recursive algorithm: LinearSearch(value, list) if the list is empty, return Λ; else if the first item of the list has the desired value, return its location; else return LinearSearch(value, remainder of the list) Selection algorithm In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list (such a number is called the kth order statistic). This includes the cases of finding the minimum, maximum, and median elements. There are O(n), worst-case linear time, selection algorithms. Selection is a subproblem of more complex problems like the nearest neighbor problem and shortest path problems. The term "selection" is used in other contexts in computer science, including the stage of a genetic algorithm in which genomes are chosen from a population for later breeding. Selection can be reduced to sorting by sorting the list and then extracting the desired element. This method is efficient when many selections need to be made from a list, in which case only one initial, expensive sort is needed, followed by many cheap extraction operations. In general, this method requires O(n log n) time, where n is the length of the list. Linear time algorithms to find minimums or maximums work by iterating over the list and keeping track of the minimum or maximum element so far. Using the same ideas used in minimum/maximum algorithms, we can construct a simple, but inefficient general algorithm for finding the kth smallest or kth largest item in a list, requiring O(kn) time, which is effective when k is small. To accomplish this, we simply find the most extreme value and move it to the beginning until we reach our desired index. This can be seen as an incomplete selection sort. Here is the minimum-based algorithm: function select(list[1..n], k) for i from 1 to k minIndex = i minValue = list[i] for j from i+1 to n 107 C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011. if list[j] < minIndex minValue swap list[i] and return list[k] minValue = j = list[j] list[minIndex] Other advantages of this method are: After locating the jth smallest element, it requires only O(j + (k-j)2) time to find the kth smallest element, or only O(k) for k ≤ j. It can be done with linked list data structures, whereas the one based on partition requires random access. Binary search In computer science, a binary search or half-interval search algorithm locates the position of an item in a sorted array [3] [4]. Binary search works by comparing an input value to the middle element of the array. The comparison determines whether the element equals the input, less than the input or greater. When the element being compared to equals the input the search stops and typically returns the position of the element. If the element is not equal to the input then a comparison is made to determine whether the input is less than or greater than the element. Depending on which it is the algorithm then starts over but only searching the top or bottom subset of the array's elements. If the input is not located within the array the algorithm will usually output a unique value indicating this. Binary search algorithms typically halve the number of items to check with each successive iteration, thus locating the given item (or determining its absence) in logarithmic time. A binary search is a dichotomic divide and conquer search algorithm. It is useful to find where an item is in a sorted array. For example, to search an array for contact information, with people's names, addresses, and telephone numbers sorted by name, binary search could be used to find out a few useful facts: whether the person's information is in the array, what the person's address is, and what the person's telephone number is. Binary search will take far fewer comparisons than a linear search, but there are some downsides. Binary search can be slower than using a hash table. If items are changed, the array will have to be resorted so that binary search will work properly, which can take so much time that the savings from using binary search aren't worth it. If you can tell ahead of time that a few items are disproportionately likely to be sought, putting those items first and using a linear search could be much faster. With each test that fails to find a match at the probed position, the search is continued with one or other of the two sub-intervals, each at most half the size. More precisely, if the number of items, N, is odd then both sub-intervals will contain (N - 1)/2 elements, while if N is even then the two sub-intervals contain N/2 - 1 and N/2 elements. If the original number of items is N then after the first iteration there will be at most N/2 items remaining, then at most N/4 items, at most N/8 items, and so on. In the worst case, when the value is not in the list, the algorithm must continue iterating until the span has been made empty; this will have taken at most log2(N) + 1 iterations, where the notation denotes the floor function that rounds its argument down to an integer. This worst case analysis is tight: for any N there exists a query that takes exactly log2(N) + 1 iterations. When compared to linear search, whose worst-case behavior is N iterations, we see that binary search is substantially faster as N grows large. For example, to search a list of one million items takes as many as one million iterations with linear search, but never more than twenty iterations with binary search. However, a binary search can only be performed if the list is in sorted order. The following incorrect (see notes below) algorithm is slightly modified (to avoid overflow) from Niklaus Wirth's in standard Pascal [5]: min := 1; max := N; {array size: var A : array [1..N] of integer} repeat mid := min + (max - min) div 2; if x > A[mid] then min := mid + 1; 108 C. Canaan et al., World Applied Programming, Vol (1), No (2), June 2011. else max := mid - 1; until (A[mid] = x) or (min > max); Note 1: In the programming language of the code above, array indexes start from 1. For languages that use 0-based indexing (e.g. most modern languages), min and max should be initialized to 0 and N-1, respectively. Note 2: The code above does not return a result, nor indicates whether the element was found or not. Note 3: The code above will not work correctly for empty arrays, because it attempts to access an element before checking to see if min > max. This code uses inclusive bounds and a three-way test (for early loop termination in case of equality), but with two separate comparisons per iteration. It is not the most efficient solution. VI. CONCLUSION In this paper, we got into sorting problem and investigated different solutions. We talked about the most popular algorithms that are useful for sorting lists. They are: Bubble sort, Selection sort, Insertion sort, Shell sort, Merge sort, Heapsort, Quicksort and Bucket sort. Algorithms were represented with perfect descriptions. Also, it was tried to indicate the computational complexity of them in the worst, middle and best cases. At the end, implementation code was placed. REFERENCES [1] [2] [3] [4] [5] Wikipedia. Address: http://www.wikipedia.com Donald Knuth (1997). The Art of Computer Programming. 3: Sorting and Searching (3rd ed.). Addison-Wesley. pp. 396–408. ISBN 0-201-89685-0. Introduction to Algorithms, available at: http://en.wikipedia.org/wiki/Introduction_to_Algorithms. http://mathworld.wolfram.com/BinarySearch.html. Niklaus Wirth: Algorithms + Data Structures = Programs. Prentice-Hall, 1975, ISBN 0-13-022418-9. 109