Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data structures and complexity Complexity n n n Computational complexity refers to how much computing is required to solve different problems. Spatial complexity refers to how much memory is required to solve different problems. Chose the right algorithm and the right data structure and your code could run in seconds. Chose the wrong algorithm or the wrong data structure and your code could run for days. Search: I’m thinking of a word... n Given a finite list of words, how do you find out which one I’m think of ? n Ground rules n n n Sequential search n n The word has to be in a dictionary, e.g. a dictionary with 60,000 words. You can only ask me questions with YES/NO answers Inspect every element and check to see if it’s the one you are looking for. Amount of “effort” is proportional to the length of the list, e.g. worst case: 60,000 questions, average case 30,000 questions. Binary search n Ask questions that reduce the number of possibilities in half. Amount of effort is proportional to the logarithm (base 2) of the length of the list, i.e. about 16 questions for 60,000 words. Binary search 60,000 yes In 1st ½ of set? no 30,000 In 1st ½ of subset? yes 15,000 30,000 no 15,000 yes 15,000 no 15,000 In 1st ½ of subset? In 1st ½ of subset? In 1st ½ of subset? In 1st ½ of subset? ~16 questions need to get down to a subset with 1 word Queues n n n A queue stores items in FIFO (first-in first-out) order. It returns them in the same order that they are entered, like a line of people at a cashier. Useful for letting one chunk of code collect (or generate) items to be processes, while a separate chunk of code does the actual processing. n n n mouse clicks internet TCP/IP packets Terminology n n Enqueue -- get in line Dequeue -- get out of line (reach the cashier) Stacks n n n A queue stores items in LIFO (last-in first-out) order. It returns them in the same order that plates are stacked in a cafeteria. Useful when operations need to be broken down into sub-operations that are executed in sequence (especially recursive operations). n n n file search in file system parsing Terminology n n push -- put a plate on the stack pop -- remove a plate from the stack Stacks: example n n n Infix notation: ((1+2)*4)+3 Postfix notation: 1 ,2, +, 4, *, 3, + Evaluate postfix expressions with a stack n n 1. if operand, push onto stack 2. if operator, pop, pop, evaluate, push result Input 1 2 + 4 * 3 + Operations Push Push Pop,Pop,Add,Push Push Pop,Pop,Mul,Push Push Pop,pop,Add,Push Stack (1) (2,1) (3) (4,3) (12) (3,12) (15) More complex data structures Recursive data structures n Example: Binary trees value left root right value left value left right value right left value left right value left right branches right value left right leaves Creating a new node n n n n Represent each node by a hash with three" keys: ‘LEFT’, ‘RIGHT’, and ‘VALUE’; The ‘VALUE’ will contain the content of the node The values of ‘LEFT’ and ‘RIGHT’ are references to the child nodes (i.e. more hashes). Here is a subroutine that returns a reference to node data structure. The argument of the subroutine is the value sub newNode { return { 'VALUE' => shift, 'LEFT' => undef, 'RIGHT' => undef value left right }; } Attaching a node $root_ref value left right $someNode_ref value left right $root_ref->{LEFT} = $someNode_ref; Trees: in-order traversal traverse($theTree); sub traverse { my($tree) = @_; if(!defined($tree)){return undef }; # if no node traverse($tree->{LEFT}); processTheNode($tree->{VALUE}); # e.g. print value traverse($tree->{RIGHT}); } Trees: insertion sub insert { # -- recursively builds the tree my($tree, $val) = @_; if(!$tree) { # no node exists so create one $_[0] = newNode($val); return; } else { # a node exists, so insert if($tree->{VALUE}>$val) {insert($tree->{LEFT},$val)} elsif($tree->{VALUE}<$val) {insert($tree->{RIGHT},$val)} else { warn "dup insert of $val\n" if 0 } } } Reminders n n Code examples in Readonly directory on Pinedalab Project coming up... Complexity Example: Search n Given a list of ordered values how do we find one? e.g. n n n Numbers in a list Words in a dictionary The complexity depends on the data structure used to represent the set of objects and on the algorithm used to process the data structure Big-O notation n Big-O notation is way to express the asymptotic timecomplexity of a computer algorithm. n n n n n n O(1) O(log(n)) O(n) O(n2) O(nc) O(cn) constant logarithmic linear quadratic polynomial exponential Linear search n Represent the set of objects as a list and then sequentially search the list n Space complexity is proportional to the number of objects. n Time complexity proportional to the number of objects, i.e. O(n). Binary Search n Represent the set of objects as a binary tree and sequentially search the list n Space complexity is proportional to the number of objects. n Time complexity ? Binary search n Represent the set of objects as a binary tree and search the tree" sub lookup { my($tree, $value) = @_; if(!$tree) { return; } elsif ($tree->{VALUE} == $value) { return $tree;} elsif($value < $tree->{VALUE} ) {return lookup($tree->{LEFT}, $value)} else {return lookup($tree->{RIGHT},$value)} } Binary search n n n n n The search time depends on how deeply in the tree you have to go to find the object The depth of the tree depends on how it was constructed Worst case: Input was presorted" depth = n, complexity: O(n) Best case: Tree is balanced depth = log(n), complexity: O(log(n)) If input is random then it can be shown that" depth = nlog(n), complexity: O(nlog(n)) n (Average case) Hash table search n Calculate a number from the key n n n Performed by a hash function Use the number to index into an array If more than one key hashes to the same index" (a collision) n Maintain a list of keys that resolve to the same hash value NP-Completeness n n n A problem is tractable if some algorithm exists that always solves the problem in a time that is proportional to some power of the length of the input. Such problems are said to be solvable in polynomial time. (Of course if the power is 50, then the problem is practically intractable). A problem with no polynomial time algorithm is said to be intractable. In the 1970’s the class of NP-complete problems was defined. NP stands for Nondeterministic polynomial. This class of problems have no known polynomial time algorithms. Salient properties of NP-complete problems n n n No NP-complete problem has been proven to be solvable in polynomial time. No NP-complete problem has been proven to be unsolvable in polynomial time. All NP-complete problems are computationally equivalent in the following sense: n n If any polynomial-time algorithm can be found to solve any NPcomplete problem, then every NP-complete problem can be solved by some polynomial-time algorithm. Since so many computer scientists and mathematicians have tried unsuccessfully to solve so many NP-complete problems, no one believes that polynomial-time algorithms exist for NP-hard problems -- but this hasn’t been proven either. The Harsh realities of life: ! Most problems of interest in bioinformatics and computational biology are NP-complete