Download 6.18_Exam2Review - Help-A-Bull

Document related concepts

Hash table wikipedia , lookup

Bloom filter wikipedia , lookup

Rainbow table wikipedia , lookup

Linked list wikipedia , lookup

Array data structure wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Quadtree wikipedia , lookup

Red–black tree wikipedia , lookup

Interval tree wikipedia , lookup

B-tree wikipedia , lookup

Binary tree wikipedia , lookup

Binary search tree wikipedia , lookup

Transcript
Exam #2 Review
Evolution of Reusability, Genericity


Major theme in development of programming languages

Reuse code

Avoid repeatedly reinventing the wheel
Trend contributing to this

Use of generic code

Can be used with different types of data
2
Function Genericity
Overloading and Templates

Initially code was reusable by encapsulating it within functions

Example lines of code to swap values stored in two variables

Instead of rewriting those 3 lines

Place in a function
void swap (int & first, int & second)
{ int temp = first;
first = second;
second = temp; }

Then call
swap(x,y);
3
Template Mechanism

Declare a type parameter


also called a type placeholder
Use it in the function instead of a specific type.

This requires a different kind of parameter list:
void Swap(______ & first, ______ & second)
{
________ temp = first;
first = second;
second = temp;
}
4
Instantiating Class Templates

Instantiate it by using declaration of form
ClassName<Type> object;

Passes Type as an argument to the class template definition.

Examples:
Stack<int>
intSt;
Stack<string> stringSt;

Compiler will generate two distinct definitions of Stack

two instances

one for ints and one for strings.
5
STL (Standard Template Library)
A library of class and function templates

Components:
1. Containers:
•
Generic "off-the-shelf" class templates for storing collections of
data
Algorithms:
2.
•
Generic "off-the-shelf" function templates for operating on
containers
Iterators:
3.
•
Generalized "smart" pointers that allow algorithms to operate on
almost any container
6
The vector Container

A type-independent pattern for an array class
 capacity
 self

can expand
contained
Declaration
template <typename T>
class vector
{
. . .
} ;
7
vector Operations


Information about a vector's contents

v.size()

v.empty()

v.capacity()

v.reserve()
Adding, removing, accessing elements

v.push_back()

v.pop_back()

v.front()

v.back()
8
Increasing Capacity of a Vector

When vector v becomes full


capacity increased automatically when item added
Algorithm to increase capacity of vector<T>

Allocate new array to store vector's elements

use T copy constructor to copy existing elements to new array

Store item being added in new array

Destroy old array in vector<T>

Make new array the vector<T>'s storage array
9
Iterators


10
Each STL container declares an iterator type

can be used to define iterator objects

Iterators are a generalization of pointers that allow a C++ program
to work with different data structures (containers) in a uniform
manner
To declare an iterator object

the identifier iterator must be preceded by

name of container

scope operator
::

Example:

Would define vecIter as an iterator positioned at the first
element of v
vector<int>::iterator vecIter = v.begin()
Iterators
Contrast use of subscript vs. use of iterator
ostream & operator<<(ostream & out, const vector<double> & v)
{
for (int i = 0; i < v.size(); i++)
out << v[i] << " ";
return out;
}
for (vector<double>::iterator it = v.begin();
it != v.end(); it++)
out << *it << " ";
11
Iterator Functions

Note Table 9-5

Note the capability of the last two groupings

Possible to insert, erase elements of a vector anywhere in the vector

Must use iterators to do this

Note also these operations are as inefficient as for arrays due to the
shifting required
12
Contrast Vectors and Arrays
Vectors
• Capacity can increase
Arrays
• Fixed size, cannot be
changed during
execution
• A self contained object • Cannot "operate" on
itself
• Is a class template (No
•Bound to specific type
specific type)
• Has function members •Must "re-invent the
wheel" for most actions
to do tasks
13
STL's deque Class Template

Has the same operations as vector<T> except …


there is no capacity() and no reserve()
Has two new operations:
 d.push_front(value);
Push copy of value at front of d
 d.pop_front(value);
Remove value at the front of d
14
vector vs. deque
vector
deque
• Capacity of a vector • With deque this
must be increased
• It must copy the objects
from the old vector to
the new vector
• It must destroy each
object in the old
vector
• A lot of overhead!
copying, creating, and
destroying is avoided.
• Once an object is
constructed, it can stay
in the same memory
locations as long as it
exists
– If insertions and
deletions take place at
the ends of the
deque.
15
vector vs. deque

Unlike vectors, a deque isn't stored in a single varying-sized block of
memory, but rather in a collection of fixed-size blocks (typically, 4K
bytes).

One of its data members is essentially an array map whose elements
point to the locations of these blocks.
16
Linear Search
Vector based search function
template <typename t>
void LinearSearch (const vector<t> &v,
const t &item, boolean &found,
int &loc)
{
found = false; loc = 0;
while(loc < n && !found)
{
if (found || loc == v.size())
return;
if (item == x[loc])
found = true;
else loc++;
}
}
17
Binary Search
Binary search function for vector
template <typename t>
void LinearSearch (const vector<t> &v,
const t &item, boolean &found, int &loc)
{
found = false;
int first = 0;
int last = v.size() - 1;
while(first <= last && !found)
{
if (found || first > last)
return;
loc = (first + last) / 2;
if (item < v[loc])
last = loc + 1;
}
}
else if (item > v[loc])
first = loc + 1;
else
/* item == v[loc] */
found = true;
18
Binary Search

Usually outperforms a linear search

Disadvantage:


Requires a sequential storage

Not appropriate for linked lists (Why?)
It is possible to use a linked structure which can be searched in a
binary-like manner
19
Trees

20
Root node
Tree terminology
• Children of the parent (3)
Leaf nodes
• Siblings to each other
Binary Trees

Each node has at most two children

Useful in modeling processes where


a comparison or experiment has exactly two possible outcomes

the test is performed repeatedly
Example

multiple coin tosses

encoding/decoding messages in dots and dashes such as Morse code
21
Binary Trees

Each node has at most two children

Useful in modeling processes where


a comparison or experiment has exactly two possible outcomes

the test is performed repeatedly
Example

multiple coin tosses

encoding/decoding messages in dots and dashes such as Morse code
22
Array Representation of Binary
Trees

Works OK for complete trees, not for sparse trees
23
Linked Representation of Binary Trees

Uses space more efficiently

Provides additional flexibility

Each node has two links

one to the left child of the node

one to the right child of the node

if no child node exists for a node, the link is set to NULL
24
Binary Trees as Recursive Data
Structures

A binary tree is either empty …
25
Anchor
or

Consists of

a node called the root

root has pointers to two
disjoint binary (sub)trees called …

right (sub)tree

left (sub)tree
Inductive
step
Which is either empty …
or … Which is either empty …
or …
ADT Binary Search Tree (BST)

Collection of Data Elements

binary tree

each node x,


value in left child of x
Basic operations

Construct an empty BST

Determine if BST is empty

Search BST for given item
value in x

in right child of x

26
ADT Binary Search Tree (BST)

Basic operations (ctd)

Insert a new item in the BST


Delete an item from the BST


Maintain the BST property
Maintain the BST property
Traverse the BST
View BST class
template, Fig. 12-1

Visit each node exactly once

The inorder traversal must visit the values in the nodes in ascending order
27
BST Traversals



Note that recursive calls must be
made

To left subtree

To right subtree
Must use two functions

Public method to send message to BST
object

Private auxiliary method that can access
BinNodes and pointers within these nodes
Similar solution to graphic output

Public graphic method

Private graphAux method
28
BST Searches

Search begins at root





If that is desired item, done
If item is less, move down
left subtree
If item searched for is greater, move down right
subtree
If item is not found, we
will run into an empty subtree
View search()
29
Inserting into a BST

30
Insert function
 Uses
modified version of
search to locate insertion
location or already existing
item
 Pointer parent trails search
pointer locptr, keeps track of
parent node
 Thus
new node can be
attached to BST in proper
place

R
View insert() function
Recursive Deletion
Three possible cases to delete a node, x, from a BST
1. The node,
x, is a leaf
31
Recursive Deletion
2. The node, x has one child
32
Recursive Deletion

33
x has two children
Delete node pointed
to by xSucc as
described for cases 1
and 2
K
Replace contents of x
with inorder successor
Problem of Lopsidedness
Trees can be totally lopsided

Suppose each node has a right child only

Degenerates into a linked list
Processing time
affected by
"shape" of tree
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

34
Hash Tables
In some situations faster search is needed

Solution is to use a hash function

Value of key field given to hash function

Location in a hash table is calculated
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

35
Hash Functions
Simple function could be to mod the value of the key by the size of
the table


H(x) = x % tableSize
Note that we have traded speed for wasted space

Table must be considerably larger than number of items anticipated

Suggested to be 1.5-2x larger
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

36
Hash Functions
Observe the problem with same value returned by h(x) for different
values of x


Called collisions
A simple solution is linear probing

Empty slots marked with -1

Linear search begins at
collision location

Continues until empty
slot found for insertion
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

37
Hash Functions
When retrieving a value
linear probe until found
 If
empty slot encountered
then value is not in table

If deletions permitted
 Slot
can be marked so
it will not be empty and cause an invalid linear
probe
 Ex. -1 for unused slots, -2 for slots which used to
contain data
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

38
Collision Reduction Strategies
Hash table capacity

Size of table must be 1.5 to 2 times the size of the number of items to be
stored

Otherwise probability of collisions is too high
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

39
Collision Reduction Strategies
Linear probing can result in primary clustering

Consider quadratic probing


Probe sequence from location i is
i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, …

Secondary clusters can still form
Double hashing

Use a second hash function to determine probe sequence
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

40
Collision Reduction Strategies
Chaining

Table is a list or vector of head nodes to linked lists

When item hashes to location, it is added to that linked list
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

41
Improving the Hash Function

Ideal hash function

Simple to evaluate

Scatters items uniformly throughout table
Modulo arithmetic not so good for strings

Possible to manipulate numeric (ASCII) value of first and last characters
of a name
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

42
Categories of Sorting Algorithms

Selection sort

Make passes through a list

On each pass reposition correctly some element (largest or smallest)
43
Array Based Selection Sort PseudoCode
//x[0] is reserved
For i = 1 to n-1 do the following:
//Find the smallest element in the sublist x[i]…x[n]
Set smallPos = i and smallest = x[smallPos]
For j = i + 1 to n-1 do the following:
If x[j] < smallest: //smaller element found
Set smallPos = j and smallest = x[smallPos]
End for
//No interchange smallest with x[i], first element of this sublist.
Set x[smallPos] = x[i] and x[i] = smallest
End for
44
In-Class Exercise #1: Selection Sort

List of 9 elements:
90, 10, 80, 70, 20, 30, 50, 40, 60
Illustrate each pass…
45
Selection Sort Solution
Pass 0
90
46
10
80
70
20
30
50
40
60
1
10
90
80
70
20
30
50
40
60
2
10
20
80
70
90
30
50
40
60
3
10
20
30
70
90
80
50
40
60
4
10
20
30
40
90
80
50
70
60
5
10
20
30
40
50
80
90
70
60
6
10
20
30
40
50
60
90
70
80
7
10
20
30
40
50
60
70
90
80
8
10
20
30
40
50
60
70
80
90
Categories of Sorting Algorithms

Exchange sort

Systematically interchange pairs of elements which are out of order

Bubble sort does this
Out of order, exchange
In order, do not exchange
47
Bubble Sort Algorithm
1. Initialize numCompares to n - 1
2. While numCompares != 0, do following
a. Set last = 1 // location of last element in a swap
b. For i = 1 to numPairs
if xi > xi + 1
Swap xi and xi + 1 and set last = i
c. Set numCompares = last – 1
End while
48
In-Class Exercise #2: Bubble Sort

List of 9 elements:
90, 10, 80, 70, 20, 30, 50, 40, 60
Illustrate each pass…
49
Bubble Sort Solution
Pass 0
90
50
10
80
70
20
30
50
40
60
1
10
80
70
20
30
50
40
60
90
2
10
70
20
30
50
40
60
80
90
3
10
20
30
50
40
60
70
80
90
4
10
20
30
40
50
60
70
80
90
5
10
20
30
40
50
60
70
80
90
Categories of Sorting Algorithms

Insertion sort

Repeatedly insert a new element into an already sorted list

Note this works well with a linked list implementation
All these have
computing time O(n2)
51
Insertion Sort Pseduo Code
(Instructor’s Recommendation)
for j = 2 to A.length
key = A[j]
//Insert A[j] into the sorted sequence A[1..j-1]
i = j-1
while i > 0 and A[i] > key
A[i+1] = A[i]
i = i-1
A[i+1] = key
52
Insertion Sort Example
53
Pass 0
5
2
4
6
1
3
1
2
5
4
6
1
3
2
2
4
5
6
1
3
3
2
4
5
6
1
3
4
1
2
4
5
6
3
5
1
2
3
4
5
6
In-Class Exercise #3: Insertion Sort

List of 5 elements:
9, 3, 1, 5, 2
Illustrate each pass, along with algorithm values of key, j and i…
54
Insertion Sort Solution
55
Pass 0
9
3
1
5
2
key
j
i
1
3
9
1
5
2
3
2
1,0
2
1
3
9
5
2
1
3
2,1,0
3
1
3
5
9
2
5
4
3,2
4
1
2
3
5
9
2
5
4,3,2,1
Quicksort

A more efficient exchange sorting scheme than
bubble sort
A typical exchange involves elements that are far
apart
 Fewer interchanges are required to correctly position
an element.


Quicksort uses a divide-and-conquer strategy
A recursive approach
 The original problem partitioned into simpler subproblems,
 Each sub problem considered independently.


Subdivision continues until sub problems
obtained are simple enough to be solved
directly
56
Quicksort
Choose some element called a pivot
 Perform a sequence of exchanges so that

 All
elements that are less than this pivot are to its left
and
 All elements that are greater than the pivot are to its
right.
Divides the (sub)list into two smaller sub lists,
 Each of which may then be sorted
independently in the same way.

57
Quicksort
If the list has 0 or 1 elements,
return. // the list is sorted
Else do:
Pick an element in the list to use as the pivot.
Split the remaining elements into two disjoint groups:
SmallerThanPivot = {all elements < pivot}
LargerThanPivot = {all elements > pivot}
Return the list rearranged as:
Quicksort(SmallerThanPivot),
pivot,
Quicksort(LargerThanPivot).
58
In-Class Exercise #4: Quicksort

List of 9 elements

30,10, 80, 70, 20, 90, 50, 40, 60

Pivot is the first element

Illustrate each pass

Clearly denote each sublist
59
Quicksort Solution
60
Pass 0 30
10
80
70
20
90
50
40
60
1
20
10
30
70
80
90
50
40
60
2
10
20
30
50
60
40
70
90
80
3
10
20
30
40
50
60
70
80
90
TO DO: How does this change if you choose the pivot as the median?
Heaps
61
A heap is a binary tree with properties:
It is complete
1.
•
Each level of tree completely filled
•
Except possibly bottom level (nodes in left most positions)
The key in any node dominates the keys of its children
2.

Min-heap: Node dominates by containing a smaller key than
its children

Max-heap: Node dominates by containing a larger key than
its children
Implementing a Heap

Use an array or vector

Number the nodes from top to bottom


Number nodes on each row from left to right
Store data in ith node in ith location of array (vector)
62
Implementing a Heap
63

In an array implementation children of ith node are at
myArray[2*i] and
myArray[2*i+1]

Parent of the ith node is at
myArray[i/2]
Basic Heap Operations

Construct an empty heap

Check if the heap is empty

Insert an item

Retrieve the largest/smallest element

Remove the largest/smallest element
64
Basic Heap Operations

Insert an item

Place new item at end of array

“Bubble” it up to the correct place

Interchange with parent so long as it is greater/less than its parent
65
Basic Heap Operations

Delete max/min item

Max/Min item is the root, swap with last node in tree

Delete last element

Bubble the top element down until heap property satisfied

Interchange with larger of two children
66
67
Percolate Down Algorithm
1. Set c = 2 * r
2. While r <= n do following
a. If c < n and myArray[c] < myArray[c + 1]
Increment c by 1
b. If myArray[r] < myArray[c]
i. Swap myArray[r] and myArray[c]
ii. set r = c
iii. Set c = 2 * c
else
Terminate repetition
End while
68
Heapsort
Given a list of numbers in an array


Stored in a complete binary tree
Convert to a heap

Begin at last node not a leaf

Apply percolated down to this subtree

Continue
Nyhoff, ADTs, Data Structures and Problem Solving with
C++, Second Edition, © 2005 Pearson Education, Inc. All
rights reserved. 0-13-140909-3

69
Heapsort Algorithm
1. Consider x as a complete binary tree, use
heapify to convert this tree to a heap
2. for i = n down to 2:
a. Interchange x[1] and x[i]
(puts largest element at end)
b. Apply percolate_down to convert binary
tree corresponding to sublist in
x[1] .. x[i-1]
70
Heapsort

Now swap element 1 (root of tree) with last element


This puts largest element in correct location
Use percolate down on remaining sublist

Converts from semi-heap to heap
71
Heapsort

Now swap element 1 (root of tree) with last element


This puts largest element in correct location
Use percolate down on remaining sublist

Converts from semi-heap to heap
72
In-Class Exercise #4: Heapsort

For each step, want to draw the heap and array

30, 10, 80, 70, 20, 90, 40
73
Array?
30
1
2
3
4
5
6
7
30
10
80
70
20
90
40
80
10
70
20
90
40
Step 1: Convert to a heap

Begin at the last node that is not a leaf, apply the percolate down
procedure to convert to a heap the subtree rooted at this node,
move to the preceding node and percolat down in that subtree
and so on, working our way up the tree, until we reach the root of
the given tree. (HEAPIFY)
74
Step 1 (ctd)

75
What is the last node that is not a leaf?
80

Apply percolate down
90
80
90
80
40
40
1
2
3
4
5
6
7
30
10
90
70
20
80
40
Step 1 (ctd)
76
10
70
10
70
20
20
1
2
3
4
5
6
7
30
70
90
10
20
80
40
Step 1(ctd)
77
30
90
90
80
70
70
10
10
20
20
We now have a heap!
80
30
40
40
1
2
3
4
5
6
7
90
70
80
10
20
30
40
Step 2: Sort and Swap
78

The largest element is now at the root

Correctly position the largest element by swapping it with the
element at the end of the list and go back and sort the remaining
6 elements
1
2
3
4
5
6
7
1
2
3
4
5
6
7
90
70
80
10
20
30
40
40
70
80
10
20
30
90
Step 2 (ctd)
79

This is not a heap. However, since only the root changed, it is a semiheap

Use percolate down to convert to a heap
40
80
70
10
20
30
Step 2 (ctd)
80
80
30
40
70
70
1010
20
20
80
30
1
2
3
4
5
6
7
80
70
40
10
20
30
90
1
2
3
4
5
6
7
30
70
40
10
20
80
90
1. Swap
2. Prune
Continue the pattern
81
70
20
40
40
30
30
10
10
20
70
1
2
3
4
5
6
7
70
30
40
10
20
80
90
1
2
3
4
5
6
7
20
30
40
10
70
80
90
Continue the pattern
10
40
30
30
10
40
20
20
82
1
2
3
4
5
6
7
40
30
20
10
70
80
90
1
2
3
4
5
6
7
10
30
20
40
70
80
90
Continue the pattern
30
20
10
10
83
1
2
3
4
5
6
7
30
10
20
40
70
80
90
1
2
3
4
5
6
7
20
10
30
40
70
80
90
30
20
Complete!
84
20
10
2010
1
2
3
4
5
6
7
20
10
30
40
70
80
90
1
2
3
4
5
6
7
10
20
30
40
70
80
90
Sorting Facts


Sorting schemes are either …

internal -- designed for data items stored in main memory

external -- designed for data items stored in secondary memory.
(Disk Drive)
Previous sorting schemes were all internal sorting algorithms:

required direct access to list elements


not possible for sequential files
made many passes through the list

not practical for files
85
Mergesort

Mergesort can be used both as an internal and an external sort.

A divide and conquer algorithm

Basic operation in mergesort is merging,

combining two lists that have previously been sorted

resulting list is also sorted.
86
Merge Algorithm
1. Open File1 and File2 for input, File3 for output
2. Read first element x from File1 and
first element y from File2
3. While neither eof File1 or eof File2
If x < y then
a. Write x to File3
b. Read a new x value from File1
Otherwise
a. Write y to File3
b. Read a new y from File2
End while
4. If eof File1 encountered copy rest of of File2 into File3. If
eof File2 encountered, copy rest of File1 into File3
87
Mergesort Algorithm
88
In-Class Exercise #6

89
Take File1 and File2 and produce a sorted File 3
File 1
7
9
19
33
47
File 2
11
18
24
49
61
File 3
51
82
99
Mergesort Solution
File 1
7
9
File 2
11 18 24
49
61
File 3
7
18
19
9
19 33
11
90
47 51
24
82 99
33 47 49
51
61
82
99
Fun Facts



91
Most of the time spent in merging

Combining two sorted lists of size n/2

What is the runtime of merge()?
Does not sort in-place

Requires extra memory to do the merging

Then copied back into the original memory
Good for external sorting

Disks are slow

Writing in long streams is more efficient
O(n)
Binary Merge Sort

Given a single file

Split into two files
92
Binary Merge Sort

Merge first one-element "subfile" of F1 with first one-element subfile
of F2


Gives a sorted two-element subfile of F
Continue with rest of one-element subfiles
93
Binary Merge Sort

Split again

Merge again as before

Each time, the size of the sorted subgroups doubles
94
Binary Merge Sort

95
Last splitting gives two files each in order
Note we always
are limited to
subfiles of some
power of 2

Last merging yields a single file, entirely in order
Natural Merge Sort

Allows sorted subfiles of other sizes


Number of phases can be reduced when file contains longer "runs" of
ordered elements
Consider file to be sorted, note in order groups
96
Natural Merge Sort

Copy alternate groupings into two files


Use the sub-groupings, not a power of 2
Look for possible larger groupings
97
Natural Merge Sort

98
Merge the corresponding sub files
EOF for F2, Copy
remaining groups from
F1
Natural Merge Sort

Split again,
alternating groups

Merge again, now two subgroups

One more split, one more merge gives sort
99