Download Searching: Binary Trees and Hash Tables - Help-A-Bull

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Binary search tree wikipedia , lookup

Transcript
Searching: Hash
Tables
Chapter 12
6/9/15
Adapted from instructor resource slides
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
1
Info:
• Still grading exams. Will review answers
Thursday
• Review how to handle issues with project
grading
• Review Thursday’s material
• Hashing (new)
• break
• Start sorting
• Review project 2
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
2
Evolution of Reusability, Genericity
• Major theme in development of programming
languages
– Reuse code
– Avoid repeatedly reinventing the wheel
• Trend contributing to this
– Use of generic code
– Can be used with different types of data
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
3
Function Genericity
Overloading and Templates
• Initially code was reusable by encapsulating it within functions
• Example lines of code to swap values stored in two variables
– Instead of rewriting those 3 lines
– Place in a function
void swap (int & first, int & second)
{ int temp = first;
first = second;
second = temp; }
– Then call swap(x,y);
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
4
Template Mechanism
• Declare a type parameter
– also called a type placeholder
• Use it in the function instead of a specific type.
– This requires a different kind of parameter list:
void Swap(______ & first, ______ & second)
{
________ temp = first;
first = second;
second = temp;
}
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
5
Instantiating Class Templates
•
Instantiate it by using declaration of form
ClassName<Type> object;
•
•
•
Passes Type as an argument to the class template definition.
Examples:
Stack<int>
intSt;
Stack<string> stringSt;
Compiler will generate two distinct definitions of Stack
– two instances
– one for ints and one for strings.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
6
STL (Standard Template Library)
•
A library of class and function templates
Components:
1. Containers:
•
Generic "off-the-shelf" class templates for storing
collections of data
2. Algorithms:
•
Generic "off-the-shelf" function templates for
operating on containers
3. Iterators:
•
Generalized "smart" pointers that allow algorithms
to operate on almost any container
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
7
The vector Container
• A type-independent pattern for an array class
– capacity can expand
– self contained
• Declaration
template <typename T>
class vector
{ . . . } ;
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
8
vector Operations
• Information about a vector's contents
–
–
–
–
v.size()
v.empty()
v.capacity()
v.reserve()
• Adding, removing, accessing elements
–
–
–
–
v.push_back()
v.pop_back()
v.front()
v.back()
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
9
Increasing Capacity of a Vector
• When vector v becomes full
– capacity increased automatically when item added
• Algorithm to increase capacity of vector<T>
– Allocate new array to store vector's elements
– use T copy constructor to copy existing elements to new array
– Store item being added in new array
– Destroy old array in vector<T>
– Make new array the vector<T>'s storage array
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
10
Iterators
• Each STL container declares an iterator type
– can be used to define iterator objects
– Iterators are a generalization of pointers that allow a
C++ program to work with different data structures
(containers) in a uniform manner
• To declare an iterator object
– the identifier iterator must be preceded by
• name of container
• scope operator ::
• Example:
vector<int>::iterator vecIter = v.begin()
• Would define vecIter as an iterator positioned at the
first element of v
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
11
Iterators
Contrast use of subscript vs. use of
iterator
ostream & operator<<(ostream & out, const vector<double> & v)
{
for (int i = 0; i < v.size(); i++)
out << v[i] << " ";
return out;
}
for (vector<double>::iterator it = v.begin();
it != v.end(); it++)
out << *it << " ";
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
12
Iterator Functions
• Note Table 9-5
• Note the capability of the last two groupings
– Possible to insert, erase elements of a vector
anywhere in the vector
– Must use iterators to do this
– Note also these operations are as inefficient as
for arrays due to the shifting required
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
13
Contrast Vectors and Arrays
Vectors
Arrays
• Capacity can increase
• Fixed size, cannot be
changed during
execution
• A self contained object • Cannot "operate" on
itself
• Is a class template (No
•Bound to specific type
specific type)
• Has function members •Must "re-invent the
wheel" for most actions
to do tasks
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
14
STL's deque Class Template
• Has the same operations as vector<T>
except …
– there is no capacity() and no reserve()
• Has two new operations:
– d.push_front(value);
Push copy of value at front of d
– d.pop_front(value);
Remove value at the front of d
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
15
vector vs. deque
vector
deque
• Capacity of a vector • With deque this
must be increased
• It must copy the objects
from the old vector to
the new vector
• It must destroy each
object in the old
vector
• A lot of overhead!
copying, creating, and
destroying is avoided.
• Once an object is
constructed, it can stay
in the same memory
locations as long as it
exists
– If insertions and
deletions take place at
the ends of the
deque.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
16
vector vs. deque
• Unlike vectors, a deque isn't stored in a
single varying-sized block of memory, but
rather in a collection of fixed-size blocks
(typically, 4K bytes).
• One of its data members is essentially an
array map whose elements point to the
locations of these blocks.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
17
Linear Search
Vector based search function
template <typename t>
void LinearSearch (const vector<t> &v,
const t &item, boolean &found,
int &loc)
{
found = false; loc = 0;
while(loc < n && !found)
{
if (found || loc ==
v.size())
return;
if (item == x[loc])
found = true;
else loc++;
}
}
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
18
Binary
Search
Binary search function for vector
template <typename t>
void LinearSearch (const vector<t> &v,
const t &item, boolean &found, int &loc)
{
found = false;
int first = 0;
int last = v.size() - 1;
while(first <= last && !found)
{
if (found || first > last)
return;
}
}
loc = (first + last) / 2;
if (item < v[loc])
last = loc + 1;
else if (item > v[loc])
first = loc + 1;
else
/* item == v[loc] */
found = true;
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
19
Binary Search
• Usually outperforms a linear search
• Disadvantage:
– Requires a sequential storage
– Not appropriate for linked lists (Why?)
• It is possible to use a linked structure which
can be searched in a binary-like manner
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
20
Trees
• Tree terminology
Root node
• Children of the parent (3)
Leaf nodes
• Siblings to each other
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
21
Binary Trees
• Each node has at most two children
• Useful in modeling processes where
– a comparison or experiment has exactly two
possible outcomes
– the test is performed repeatedly
• Example
– multiple coin tosses
– encoding/decoding messages in dots and
dashes such as Morse code
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
22
Binary Trees
• Each node has at most two children
• Useful in modeling processes where
– a comparison or experiment has exactly two
possible outcomes
– the test is performed repeatedly
• Example
– multiple coin tosses
– encoding/decoding messages in dots and
dashes such as Morse code
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
23
Array Representation of Binary Trees
• Works OK for complete trees, not for sparse
trees
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
24
Linked Representation of Binary Trees
• Uses space more efficiently
• Provides additional flexibility
• Each node has two links
– one to the left child of the node
– one to the right child of the node
– if no child node exists for a node, the link is set to
NULL
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
25
Binary Trees as Recursive Data Structures
• A binary tree is either empty …
Anchor
or
• Consists of
– a node called the root
– root has pointers to two
disjoint binary (sub)trees called …
• right (sub)tree
• left (sub)tree
Inductive
step
Which is either empty …
or … Which is either empty …
or …
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
26
ADT Binary Search Tree (BST)
• Collection of Data Elements
– binary tree
– each node x,
• value in left child of x  value in x  in right child of x
• Basic operations
– Construct an empty BST
– Determine if BST is empty
– Search BST for given item
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
27
ADT Binary Search Tree (BST)
• Basic operations (ctd)
– Insert a new item in the BST
• Maintain the BST property
– Delete an item from the BST
• Maintain the BST property View BST class
– Traverse the BST
template, Fig. 12-1
• Visit each node exactly once
• The inorder traversal must visit the values in the
nodes in ascending order
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
28
BST Traversals
• Note that recursive calls must be made
– To left subtree
– To right subtree
• Must use two functions
– Public method to send message to BST
object
– Private auxiliary method that can access
BinNodes and pointers within these
nodes
• Similar solution to graphic output
– Public graphic method
– Private graphAux method
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
29
BST Searches
• Search begins at root
– If that is desired item, done
• If item is less, move down
left subtree
• If item searched for is greater, move down right
subtree
• If item is not found, we
will run into an empty subtree
• View search()
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
30
Inserting into a BST
• Insert function
– Uses modified version of search
to locate insertion location or
already existing item
– Pointer parent trails search
pointer locptr, keeps track of
parent node
– Thus new node can be attached
to BST in proper place
• View insert() function
R
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
31
Recursive Deletion
Three possible cases to delete a node, x, from
a BST
1. The node,
x, is a leaf
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
32
Recursive Deletion
2. The node, x has one child
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
33
Recursive Deletion
• x has two children
Delete node pointed to
by xSucc as described
for cases 1 and 2
K
Replace contents of x with
inorder successor
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
34
Problem of Lopsidedness
• Trees can be totally lopsided
– Suppose each node has a right child only
– Degenerates into a linked list
Processing time
affected by
"shape" of tree
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
35
Hash Tables
• Recall order of magnitude of searches
– Linear search O(n)
– Binary search O(log2n)
– Balanced binary tree search O(log2n)
– Unbalanced binary tree can degrade to O(n)
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
36
Hash Tables
• In some situations faster search is needed
– Solution is to use a hash function
– Value of key field given to hash function
– Location in a hash table is calculated
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
37
Hash Functions
• Simple function could be to mod the value of
the key by the size of the table
– H(x) = x % tableSize
• Note that we have traded speed for wasted
space
– Table must be considerably larger than number
of items anticipated
– Suggested to be 1.5-2x larger
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
38
Hash Functions
• Observe the problem with same value
returned by h(x) for different values of x
– Called collisions
• A simple solution is linear probing
– Empty slots marked with -1
– Linear search begins at
collision location
– Continues until empty
slot found for insertion
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
39
Hash Functions
• When retrieving a value
linear probe until found
– If empty slot encountered
then value is not in table
• If deletions permitted
– Slot can be marked so
it will not be empty and cause an invalid linear probe
– Ex. -1 for unused slots, -2 for slots which used to contain
data
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
40
Collision Reduction Strategies
• Strategies for improved performance
– Increase table capacity (less collisions)
– Use different collision resolution technique
– Devise different hash function
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
41
Collision Reduction Strategies
• Hash table capacity
– Size of table must be 1.5 to 2 times the size of
the number of items to be stored
– Otherwise probability of collisions is too high
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
42
Collision Reduction Strategies
• Linear probing can result in primary
clustering
• Consider quadratic probing
– Probe sequence from location i is
i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, …
– Secondary clusters can still form
• Double hashing
– Use a second hash function to determine probe
sequence
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
43
Collision Reduction Strategies
• Chaining
– Table is a list or vector of head nodes to linked
lists
– When item hashes to location, it is added to that
linked list
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
44
Improving the Hash Function
• Ideal hash function
– Simple to evaluate
– Scatters items uniformly throughout table
• Modulo arithmetic not so good for strings
– Possible to manipulate numeric (ASCII) value of
first and last characters of a name
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3
45