Download Data structures and complexity

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Rainbow table wikipedia , lookup

Quadtree wikipedia , lookup

B-tree wikipedia , lookup

Red–black tree wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Interval tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Data structures and complexity
Complexity
n 
n 
n 
Computational complexity refers to how much computing
is required to solve different problems.
Spatial complexity refers to how much memory is
required to solve different problems.
Chose the right algorithm and the right data structure
and your code could run in seconds. Chose the wrong
algorithm or the wrong data structure and your code
could run for days.
Search: I’m thinking of a word...
n 
Given a finite list of words, how do you find out which
one I’m think of ?
n 
Ground rules
n 
n 
n 
Sequential search
n 
n 
The word has to be in a dictionary, e.g. a dictionary with 60,000 words.
You can only ask me questions with YES/NO answers
Inspect every element and check to see if it’s the one you are
looking for. Amount of “effort” is proportional to the length of
the list, e.g. worst case: 60,000 questions, average case 30,000
questions.
Binary search
n 
Ask questions that reduce the number of possibilities in half.
Amount of effort is proportional to the logarithm (base 2) of the
length of the list, i.e. about 16 questions for 60,000 words.
Binary search
60,000
yes
In 1st ½ of set?
no
30,000
In 1st ½ of subset? yes
15,000
30,000
no
15,000
yes
15,000
no
15,000
In 1st ½ of subset?
In 1st ½ of subset?
In 1st ½ of subset?
In 1st ½ of subset?
~16 questions need to get down to a subset with 1 word
Queues
n 
n 
n 
A queue stores items in FIFO (first-in first-out) order.
It returns them in the same order that they are entered,
like a line of people at a cashier.
Useful for letting one chunk of code collect (or generate)
items to be processes, while a separate chunk of code
does the actual processing.
n 
n 
n 
mouse clicks
internet TCP/IP packets
Terminology
n 
n 
Enqueue -- get in line
Dequeue -- get out of line (reach the cashier)
Stacks
n 
n 
n 
A queue stores items in LIFO (last-in first-out)
order.
It returns them in the same order that plates are
stacked in a cafeteria.
Useful when operations need to be broken down
into sub-operations that are executed in
sequence (especially recursive operations).
n 
n 
n 
file search in file system
parsing
Terminology
n 
n 
push -- put a plate on the stack
pop -- remove a plate from the stack
Stacks: example
n 
n 
n 
Infix notation: ((1+2)*4)+3
Postfix notation: 1 ,2, +, 4, *, 3, +
Evaluate postfix expressions with a stack
n 
n 
1. if operand, push onto stack
2. if operator, pop, pop, evaluate, push result
Input
1
2
+
4 
*
3 
+
Operations
Push
Push
Pop,Pop,Add,Push
Push
Pop,Pop,Mul,Push
Push
Pop,pop,Add,Push
Stack
(1)
(2,1)
(3)
(4,3)
(12)
(3,12)
(15)
More complex data structures
Recursive data structures
n 
Example: Binary trees
value
left
root
right
value
left
value
left
right
value
right
left
value
left
right
value
left
right
branches
right
value
left
right
leaves
Creating a new node
n 
n 
n 
n 
Represent each node by a hash with three"
keys: ‘LEFT’, ‘RIGHT’, and ‘VALUE’;
The ‘VALUE’ will contain the content of the node
The values of ‘LEFT’ and ‘RIGHT’ are references to the child
nodes (i.e. more hashes).
Here is a subroutine that returns a reference to node data
structure. The argument of the subroutine is the value
sub newNode {
return {
'VALUE' => shift,
'LEFT' => undef,
'RIGHT' => undef
value
left
right
};
}
Attaching a node
$root_ref
value
left
right
$someNode_ref
value
left
right
$root_ref->{LEFT} = $someNode_ref;
Trees: in-order traversal
traverse($theTree);
sub traverse {
my($tree) = @_;
if(!defined($tree)){return undef }; # if no node
traverse($tree->{LEFT});
processTheNode($tree->{VALUE}); # e.g. print value
traverse($tree->{RIGHT});
}
Trees: insertion
sub insert { # -- recursively builds the tree
my($tree, $val) = @_;
if(!$tree) { # no node exists so create one
$_[0] = newNode($val);
return;
}
else {
# a node exists, so insert
if($tree->{VALUE}>$val)
{insert($tree->{LEFT},$val)}
elsif($tree->{VALUE}<$val)
{insert($tree->{RIGHT},$val)}
else
{ warn "dup insert of $val\n" if 0 }
}
}
Reminders
n 
n 
Code examples in Readonly directory on Pinedalab
Project coming up...
Complexity
Example: Search
n 
Given a list of ordered values how do we find
one? e.g.
n 
n 
n 
Numbers in a list
Words in a dictionary
The complexity depends on the data structure
used to represent the set of objects and on the
algorithm used to process the data structure
Big-O notation
n 
Big-O notation is way to express the asymptotic timecomplexity of a computer algorithm.
n 
n 
n 
n 
n 
n 
O(1)
O(log(n))
O(n)
O(n2)
O(nc)
O(cn)
constant
logarithmic
linear
quadratic
polynomial
exponential
Linear search
n 
Represent the set of objects as a list and
then sequentially search the list
n  Space
complexity is proportional to the number
of objects.
n  Time complexity proportional to the number of
objects, i.e. O(n).
Binary Search
n 
Represent the set of objects as a binary tree
and sequentially search the list
n  Space
complexity is proportional to the number
of objects.
n  Time complexity ?
Binary search
n 
Represent the set of objects as a binary tree and
search the tree"
sub lookup {
my($tree, $value) = @_;
if(!$tree)
{ return; }
elsif ($tree->{VALUE} == $value)
{ return $tree;}
elsif($value < $tree->{VALUE} )
{return lookup($tree->{LEFT}, $value)}
else
{return lookup($tree->{RIGHT},$value)}
}
Binary search
n 
n 
n 
n 
n 
The search time depends on how deeply in the
tree you have to go to find the object
The depth of the tree depends on how it was
constructed
Worst case: Input was presorted"
depth = n, complexity: O(n)
Best case: Tree is balanced
depth = log(n), complexity: O(log(n))
If input is random then it can be shown that"
depth = nlog(n), complexity: O(nlog(n))
n 
(Average case)
Hash table search
n 
Calculate a number from the key
n 
n 
n 
Performed by a hash function
Use the number to index into an array
If more than one key hashes to the same index"
(a collision)
n 
Maintain a list of keys that resolve to the same hash
value
NP-Completeness
n 
n 
n 
A problem is tractable if some algorithm exists that
always solves the problem in a time that is proportional
to some power of the length of the input. Such problems
are said to be solvable in polynomial time. (Of course if
the power is 50, then the problem is practically
intractable).
A problem with no polynomial time algorithm is said to
be intractable.
In the 1970’s the class of NP-complete problems was
defined. NP stands for Nondeterministic polynomial. This
class of problems have no known polynomial time
algorithms.
Salient properties of NP-complete
problems
n 
n 
n 
No NP-complete problem has been proven to be solvable in
polynomial time.
No NP-complete problem has been proven to be unsolvable in
polynomial time.
All NP-complete problems are computationally equivalent in the
following sense:
n 
n 
If any polynomial-time algorithm can be found to solve any NPcomplete problem, then every NP-complete problem can be solved by
some polynomial-time algorithm.
Since so many computer scientists and mathematicians have tried
unsuccessfully to solve so many NP-complete problems, no one
believes that polynomial-time algorithms exist for NP-hard
problems -- but this hasn’t been proven either.
The Harsh realities of life: !
Most problems of interest in
bioinformatics and computational biology
are NP-complete