Download Lecture Note 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Array data structure wikipedia , lookup

Control table wikipedia , lookup

Bloom filter wikipedia , lookup

Hash table wikipedia , lookup

Rainbow table wikipedia , lookup

Transcript
ITEC 2620A
Introduction to Data Structures
Instructor: Prof. Z. Yang
Course Website:
http://people.math.yorku.ca/~zyang/itec
2620a.htm
Office: TEL 3049
Graphs
Key Points
• Graph Algorithms
– Definitions, representations, analysis
– Shortest paths
– Minimum-cost spanning tree
3
Basic Definitions
• A graph G = ( V, E ) consists of a set of vertices V and a set
of edges E – each edge E connects a pair of vertices in V.
• Graphs can be directed or undirected.
– redraw above with arrows – first vertex is source
• Graphs may be weighted.
– redraw above with weights, combine definitions
• A vertex vi is adjacent to another vertex vj if they are
connected by an edge in E. These vertices are neighbors.
• A path is a sequence of vertices in which each vertex is
adjacent to its predecessor and successor.
• The length of a path is the number of edges in it.
– The cost of a path is the sum of edge weights in the path
4
Basic Definitions (Cont’d)
• A cycle is a path of length greater than one that begins and
ends at the same vertex.
• A simple cycle is a cycle of length greater than three that
does not visit any vertex (except the start/finish) more than
once.
• Two vertices are connected if there as a path between
them.
• A subset of vertices S is a connected component of G if
there is a path from each vertex vi to every other distinct
vertex vj in S.
• The degree of a vertex is the number of edges incident to it.
– the number of vertices that it is connected to
• A graph is acyclic if it has no cycles (e.g. a tree) .
– A directed acyclic graph is called a DAG or digraph
5
Representations
• The adjacency matrix of graph G = ( V, E ) for
vertices numbered 0 to n-1 is an n x n matrix M
where M[i][j] is 1 if there is an edge from vi to
vj, and 0 otherwise.
• The adjacency list of graph G = ( V, E ) for
vertices numbered 0 to n-1 consists of an
array of n linked lists. The ith linked list
includes the node j if there is an edge from vi
to vj.
• Example
6
Comparisons and Analysis
• Space
– adjacency matrix uses O( ) space
|v|2 (constant)
– adjacency list uses O(|V| + |E|) space (note: pointer overhead)
• better for sparse graphs (graphs with few edges)
• Access Time
– Is there an edge connecting vi to vj?
• adjacency matrix – O(1)
• adjacency list – O(d)
– Visit all edges incident to vi
• adjacency matrix – O(n)
• adjacency list – O(d)
– Primary operation of algorithm and density of graph determines
more efficient data structure.
• complete graphs should use adjacency matrix
• traversals of sparse graphs should use adjacency list
7
Spanning Tree and Shortest Paths
• Minimum-Cost Spanning Tree
– assume weighted (undirected) connected graph
– use Prim’s algorithm (a greedy algorithm)
• from visited vertices, pick least-cost edge to an unvisited
vertex
• Shortest Paths
– assume weighted (undirected) connected graph
– use Dijkstra’s algorithm (a greedy algorithm)
• build paths from unvisited vertex with least current cost
8
HASHING
Key Points
• Hash tables
• Hash functions
• Collision resolution and clustering
• Deletions
10
Indices vs. Keys
• Each key/record is associated with an array
slot.
• We could map each key to each slot.
– e.g. last name to apartment number
• We could then search either the array
(unsorted?) or a look-up table (sorted?) .
• However, what if the look-up is actually a
calculated function?
– eliminate look-up!
11
Hash Functions
• A hash function h() converts a key
(integer, string, float, etc) into a table
index.
• Example
12
Hash Tables
• Records are stored in slots specified by
a hash function.
• Look-up/store
– Convert key into a table index with hash
function h()
•h(key) = index
– Find record/empty slot starting at index =
h(key)
(use resolution policy if necessary)
13
Comments
• Hash function should evenly distribute keys
across table.
– not easy given unspecified input data distribution
• Hash table should be about half full.
– note: time-space tradeoff
• more space -> less time
(and already twice as much space as a sorted array)
– if half full, 50% chance of one collision
• 25% chance of two collisions
• etc...
• 2 accesses on average
(approaches n as table fills)
14
How to do better
• What to do with collisions?
– linear probing (“classic hashing”)
• if collision, search spaces sequentially
• To eliminate clustering, we would like each
remaining slot to have equal probability.
• Can’t use random – needs to be reproducable.
• Pseudo-random probing (see text)
• Goal of random probing? --> cause divergence
– Probe sequences should not all follow same path.
15
Quadratic Probing
• Simple divergence method
• Linear probing – ith probe is i slots away
• Quadratic probing
16
Secondary Clustering
• If multiple keys are hashed to the same
index/home position, quadratic probing
still follows the same path each time.
– This is secondary clustering
• Use second hash function to determine
probe sequence.
17