Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: http://people.math.yorku.ca/~zyang/itec 2620a.htm Office: TEL 3049 Graphs Key Points • Graph Algorithms – Definitions, representations, analysis – Shortest paths – Minimum-cost spanning tree 3 Basic Definitions • A graph G = ( V, E ) consists of a set of vertices V and a set of edges E – each edge E connects a pair of vertices in V. • Graphs can be directed or undirected. – redraw above with arrows – first vertex is source • Graphs may be weighted. – redraw above with weights, combine definitions • A vertex vi is adjacent to another vertex vj if they are connected by an edge in E. These vertices are neighbors. • A path is a sequence of vertices in which each vertex is adjacent to its predecessor and successor. • The length of a path is the number of edges in it. – The cost of a path is the sum of edge weights in the path 4 Basic Definitions (Cont’d) • A cycle is a path of length greater than one that begins and ends at the same vertex. • A simple cycle is a cycle of length greater than three that does not visit any vertex (except the start/finish) more than once. • Two vertices are connected if there as a path between them. • A subset of vertices S is a connected component of G if there is a path from each vertex vi to every other distinct vertex vj in S. • The degree of a vertex is the number of edges incident to it. – the number of vertices that it is connected to • A graph is acyclic if it has no cycles (e.g. a tree) . – A directed acyclic graph is called a DAG or digraph 5 Representations • The adjacency matrix of graph G = ( V, E ) for vertices numbered 0 to n-1 is an n x n matrix M where M[i][j] is 1 if there is an edge from vi to vj, and 0 otherwise. • The adjacency list of graph G = ( V, E ) for vertices numbered 0 to n-1 consists of an array of n linked lists. The ith linked list includes the node j if there is an edge from vi to vj. • Example 6 Comparisons and Analysis • Space – adjacency matrix uses O( ) space |v|2 (constant) – adjacency list uses O(|V| + |E|) space (note: pointer overhead) • better for sparse graphs (graphs with few edges) • Access Time – Is there an edge connecting vi to vj? • adjacency matrix – O(1) • adjacency list – O(d) – Visit all edges incident to vi • adjacency matrix – O(n) • adjacency list – O(d) – Primary operation of algorithm and density of graph determines more efficient data structure. • complete graphs should use adjacency matrix • traversals of sparse graphs should use adjacency list 7 Spanning Tree and Shortest Paths • Minimum-Cost Spanning Tree – assume weighted (undirected) connected graph – use Prim’s algorithm (a greedy algorithm) • from visited vertices, pick least-cost edge to an unvisited vertex • Shortest Paths – assume weighted (undirected) connected graph – use Dijkstra’s algorithm (a greedy algorithm) • build paths from unvisited vertex with least current cost 8 HASHING Key Points • Hash tables • Hash functions • Collision resolution and clustering • Deletions 10 Indices vs. Keys • Each key/record is associated with an array slot. • We could map each key to each slot. – e.g. last name to apartment number • We could then search either the array (unsorted?) or a look-up table (sorted?) . • However, what if the look-up is actually a calculated function? – eliminate look-up! 11 Hash Functions • A hash function h() converts a key (integer, string, float, etc) into a table index. • Example 12 Hash Tables • Records are stored in slots specified by a hash function. • Look-up/store – Convert key into a table index with hash function h() •h(key) = index – Find record/empty slot starting at index = h(key) (use resolution policy if necessary) 13 Comments • Hash function should evenly distribute keys across table. – not easy given unspecified input data distribution • Hash table should be about half full. – note: time-space tradeoff • more space -> less time (and already twice as much space as a sorted array) – if half full, 50% chance of one collision • 25% chance of two collisions • etc... • 2 accesses on average (approaches n as table fills) 14 How to do better • What to do with collisions? – linear probing (“classic hashing”) • if collision, search spaces sequentially • To eliminate clustering, we would like each remaining slot to have equal probability. • Can’t use random – needs to be reproducable. • Pseudo-random probing (see text) • Goal of random probing? --> cause divergence – Probe sequences should not all follow same path. 15 Quadratic Probing • Simple divergence method • Linear probing – ith probe is i slots away • Quadratic probing 16 Secondary Clustering • If multiple keys are hashed to the same index/home position, quadratic probing still follows the same path each time. – This is secondary clustering • Use second hash function to determine probe sequence. 17