Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSE 5350/7350 Introduction to Algorithms Data Structures Specification and Implementation Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon, Ph.D. [email protected] CSE 5350 - Fall 2007 Data Structures Slide 1 Objectives • Understand what dynamic sets are • Learn basic techniques for a) Representing & b) Manipulating finite dynamic set • Elementary Data Structures – Stacks, queues, heaps, linked lists • More Complex Data Structures – Hash tables, binary search trees • Data Structures in C#.NET 2.0 CSE 5350 - Fall 2007 Data Structures Slide 2 High-Level Structure (1) • Arrays – System.Collections.ArrayList – System.Collections.Generic.List • Queue – System.Collections.Generic.Queue • Stack – System.Collections.Generic.Stack CSE 5350 - Fall 2007 Data Structures Slide 3 High-Level Structure (2) • Hashtable – System.Collections.Hashtable – System.Collections.Generic.Dictionary • Trees – Binary Trees, BST, Self-Balancing BST – Linked Lists • System.Collections.Generic.LinkedList • Graphs CSE 5350 - Fall 2007 Data Structures Slide 4 Dynamic Data Sets • Definition • Why dynamic • General examples • Data structures and the .NET framework • “An Extensive Examination of Data Structures Using C# 2.0” – Scott Mitchell • http://msdn2.microsoft.com/enus/library/ms364091(VS.80).aspx CSE 5350 - Fall 2007 Data Structures Slide 5 Data Structure Design • Impact on efficiency/running time • The data structure used by an algorithm can greatly affect the algorithm's performance • Important to have rigorous method by which to compare the efficiency of various data structures CSE 5350 - Fall 2007 Data Structures Slide 6 Example: file extension search public bool DoesExtensionExist(string [] fileNames, string extension) { int i = 0; for (i = 0; i < fileNames.Length; i++) if (String.Compare(Path.GetExtension(fileNames[i]), extension, true) == 0) return true; return false; // If we reach here, we didn't find the extension } } • Search is of O(n) CSE 5350 - Fall 2007 Data Structures Slide 7 The Array • Linear • Simple • Direct Access • Homogeneous • Most widely used CSE 5350 - Fall 2007 Data Structures Slide 8 The Array (2) • The contents of an array are stored in contiguous memory. • All of the elements of an array must be of the same type or of a derived type; hence arrays are referred to as homogeneous data structures. • Array elements can be directly accessed. With arrays if you know you want to access the ith element, you can simply use one line of code: arrayName[i]. CSE 5350 - Fall 2007 Data Structures Slide 9 Array Operations • Allocation • Accessing – Declaring an array in C#: string[] myArray; (initially myArray reference is null) – Creating an array in C#: myArray = new string[5]; CSE 5350 - Fall 2007 Data Structures Slide 10 Array Allocation • string[] myArray = new string[someIntegerSize]; • this allocates a contiguous block of memory on the heap (CLR-managed) CSE 5350 - Fall 2007 Data Structures Slide 11 Array Accessing • Accessing an element at index i: O(1) • Searching through and array – Unsorted: O(n) – Sorted: O(log n) • Array class: static method: – Array.BinarySearch(Array input, object val) CSE 5350 - Fall 2007 Data Structures Slide 12 Array Resizing • When the size needs to change: – Must create a new array instance – Copy old array into new array: Array1.CopyTo(Array2, 0) • Time consuming • Also, inserting into an array is problematic CSE 5350 - Fall 2007 Data Structures Slide 13 Multi-Dimensional Arrays • Rectangular – – – – nxn nxnxnx… Accessing: O(1) Searching: O(nk) • Jagged/Ragged – n1 x n2 x n3 x … CSE 5350 - Fall 2007 Data Structures Slide 14 Goals • Type-safe • Performant • Reusable • Example: payroll application CSE 5350 - Fall 2007 Data Structures Slide 15 System.Collections.ArrayList • Can hold any data type: (hybrid) • Internally: array object • Automatic resizing • Not type safe: casting errors detected only at runtime • Boxing/unboxing: extra-level of indirection affects performance • Loose homogeneity CSE 5350 - Fall 2007 Data Structures Slide 16 Generics • Remedy for Typing and Performance • Type-safe collections • Reusability • Example: public class MyTypeSafeList<T> { T[] innerArray = new T[0]; } CSE 5350 - Fall 2007 Data Structures Slide 17 List • Homogeneous • Self-Re-dimensioning Array • System.Collections.Generic.List List<string> studentNames = new List<string>(); studentNames.Add(“John”); … string name = studentNames[3]; studentNames[2] = “Mike”; CSE 5350 - Fall 2007 Data Structures Slide 18 List Methods • Contains() • IndexOf() • BinarySearch() • Find() • FindAll() • Sort() – Asymptotic Running Time: same as array but with extra overhead CSE 5350 - Fall 2007 Data Structures Slide 19 Ordered Requests Processing • First-come, First-serve (FIFO) • Priority-based processing • Inefficient to use List<T> • List will continue to grow (internally, the size is doubled every time) • Solution: circular list/array • Problem: initial size?? CSE 5350 - Fall 2007 Data Structures Slide 20 Queue • System.Collections.Generic.Queue • Operations: – – – – – Enqueue() Dequeue() Contains() ToArray() Peek() • Does not allow random access • Type-safe; maximizes space utilization CSE 5350 - Fall 2007 Data Structures Slide 21 Queue (continued) • Applications: – Web servers – Print queues • Rate of growth: – Specified in the constructor – Default: double initial size CSE 5350 - Fall 2007 Data Structures Slide 22 Stack • LIFO • System.Collections.Generic.Stack • Operations: – Push() – Pop() • Doubles in size when more space is needed • Applications: – CLR call stack (functions invocation) CSE 5350 - Fall 2007 Data Structures Slide 23 Limitations of Ordinal Indexing • Ideal access time: O(1) • If index is unknown – O(n) if not sorted – O(log n) if sorted • Example: SSN: 10 ^ 9 possible combinations • Solution: compress the ordinal indexing domain with a hash function; e.g. use only 4 digits CSE 5350 - Fall 2007 Data Structures Slide 24 Hash Table • Hashing: – Math transformation of one representation into another representation • Hash table: – The array that uses hashing to compress the indexers space • Cryptography (information security) • Hash function: – Non-injective (not a one-to-one function) – “Fingerprint” of initial data CSE 5350 - Fall 2007 Data Structures Slide 25 Goals • Fast access of items in large amounts of data • Few collisions as possible – collision avoidance • Avalanche effect: – Minor changes to input major changes to output CSE 5350 - Fall 2007 Data Structures Slide 26 Collision Resolution (1) • Probability to map to a given location: 1/k (k = size = number of slots) • (1) Linear Probing Is H[i] empty? • YES: place item at location I • NO: i = i + 1; repeat – Deficiency: clustering – Access and Insertion: no longer O(1) CSE 5350 - Fall 2007 Data Structures Slide 27 Collision Resolution (2) • (2) Quadratic Probing – – – – – – Check s + 12 Check s – 12 Check s + 22 Check s – 22 … Check s +/- i2 – Clustering a problem as well CSE 5350 - Fall 2007 Data Structures Slide 28 Collision Resolution (3) • (3) Rehashing – used by Hashtable (C#) • System.Collections.Hashtable • Operations: – – – – – Add(key, item) ContainsKey() Keys() ContainsValue() Values() • Key, Value: any type not type safe CSE 5350 - Fall 2007 Data Structures Slide 29 Hashtable Data Type – Example using System; using System.Collections; public class HashtableDemo { private static Hashtable employees = new Hashtable(); public static void Main() { // Add some values to the Hashtable, indexed by a string key employees.Add("111-22-3333", "Scott"); employees.Add("222-33-4444", "Sam"); employees.Add("333-44-55555", "Jisun"); } } // Access a particular key if (employees.ContainsKey("111-22-3333")) { string empName = (string) employees["111-22-3333"]; Console.WriteLine("Employee 111-22-3333's name is: " + empName); } else Console.WriteLine("Employee 111-22-3333 is not in the hash table..."); CSE 5350 - Fall 2007 Data Structures Slide 30 Hashtable • Key = any type • Key is transformed into an index via GetHashCode() function • Object class defines GetHashCode() • H(key) = [GetHash(key) + 1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))] % hashsize Values = 0 .. hashsize-1 CSE 5350 - Fall 2007 Data Structures Slide 31 Collision Resolution (3 – cont’d) • Rehashing = double hashing • Set of hash functions: H1, H2, …, Hn • Hk(key) = [GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] % hashsize • Hashsize must be PRIME CSE 5350 - Fall 2007 Data Structures Slide 32 Hashtable • Load Factor = MAX ( # items / # slots) • Optimal: 0.72 • Expanding the hashtable: 2 steps: (costly) – Double # slots (crt prime next prime which is about twice bigger) – Rehash • High LoadFactor Dense Hashtable – Less space – More probes on collision (1/(1-LF)) – If LF = 0.72 expected # probes = 3.5 O(1) CSE 5350 - Fall 2007 Data Structures Slide 33 Hashtable • Costly to expand • Set the size in constructor if size is known • Asymptotic running times: – Access: O(1) – Add, Remove: O(1) – Search: O(1) CSE 5350 - Fall 2007 Data Structures Slide 34 System.Collections.Generic.Dictionary • Typesafe • Strongly typed KEYS + VALUES • Operations: – Add(key, value) – ContainsKey(key) • Collision Resolution: CHAINING – Uses linked lists from an entry where collision occurs CSE 5350 - Fall 2007 Data Structures Slide 35 Chaining in Dictionary Data Type CSE 5350 - Fall 2007 Data Structures Slide 36 Dictionary Example Dictionary<keyType, valueType> variableName = new Dictionary<keyType, valueType>(); Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>(); // Add some employees employeeData.Add(455110189) = new Employee("Scott Mitchell"); employeeData.Add(455110191) = new Employee("Jisun Lee"); ... // See if employee with SSN 123-45-6789 works here if (employeeData.ContainsKey(123456789)) ... CSE 5350 - Fall 2007 Data Structures Slide 37 Chaining in the Dictionary type • Efficiency: – Add: O(1) – Remove: O (n/m) – Search: O(n/m) Where: n = hash table size m = number of buckets/slots • Implemented s.t. n=m at ALL times – The total # of chained elements can never exceed the number of buckets CSE 5350 - Fall 2007 Data Structures Slide 38 Trees • = set of linked nodes where no cycle exists • (GT) a connected acyclic graph • Nodes: – Root – Leaf – Internal • |E| = ? • Forrest = { trees } CSE 5350 - Fall 2007 Data Structures Slide 39 Popular Tree-Type Data Structures • BST: Binary Search Tree • Heap • Self-balancing binary search trees – AVL – Red-black • Radix tree •… CSE 5350 - Fall 2007 Data Structures Slide 40 Binary Trees • Code example for defining a tree data object • Tree Traversal – – – – In-order: L Ro R Pre-order: Ro L R Post-order: L R Ro Ө(n) CSE 5350 - Fall 2007 Data Structures Slide 41 Binary Tree Data Structure CSE 5350 - Fall 2007 Data Structures Slide 42 Tree Operations • Search: Recursive: O(h) – h = height of the tree • Max & Min Search: search right/left • Successor & Predecessor Search • Insertion (easy: always add a new leaf) & Deletion (more complicated as it may cause the tree structure to change) • Running time: – function of the tree topology CSE 5350 - Fall 2007 Data Structures Slide 43 Binary Search Tree • Improves the search time (and lookup time) over the binary tree in general • BST property: – for any node n, every descendant node's value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n CSE 5350 - Fall 2007 Data Structures Slide 44 Non-BST vs BST (a) Non-BST (b) BST CSE 5350 - Fall 2007 Data Structures Slide 45 Linear Search Time in BST The search time for a BST depends upon its topology. CSE 5350 - Fall 2007 Data Structures Slide 46 BST continued • Perfectly balanced BST: – Search: O(log n) [ height = log n] • Sub-linear search running time • Balanced Binary Tree: – Exhibits a good ration: breadth/width • Self-balancing trees CSE 5350 - Fall 2007 Data Structures Slide 47 The Heap • Specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). [max-heap] • Operations: – delete-max or delete-min: removing the root node of a max- or min-heap, respectively – increase-key or decrease-key: updating a key within a max- or min-heap, respectively – insert: adding a new key to the heap – merge: joining two heaps to form a valid new heap containing all the elements of both CSE 5350 - Fall 2007 Data Structures Slide 48 Max Heap Example • Example of max-heap: CSE 5350 - Fall 2007 Data Structures Slide 49 Linked Lists • No resizing necessary • Search: O(n) • Insertion – O(1) if unsorted – O(n) is sorted • Access: O(n) • System.Collections.Generic.LinkedList – Doubly-linked; type safe (value Generics) – Element: LinkedListNode CSE 5350 - Fall 2007 Data Structures Slide 50 Skip List • Link list with self-balancing BST-like property • The elements are sorted • Height = log n • Problems with insert & delete • Solution: randomized distribution • Overall: O(log n) • Worst case: O(n) – but very, very, slim changes to reach worst case CSE 5350 - Fall 2007 Data Structures Slide 51 Skip List Examples CSE 5350 - Fall 2007 Data Structures Slide 52 Graphs • A collection of interconnected nodes • A graph or undirected graph G is an ordered pair G: = (V,E) that is subject to the following conditions: – V is a set, whose elements are called vertices or nodes, – E is a set of pairs (unordered) of distinct vertices, called edges or lines. • Edges (1): – Directed – Undirected CSE 5350 - Fall 2007 - Weighted - Unweighted Data Structures Slide 53 Graph (cont’d) • Sparse: |E| << |Emax| or |E| ≤ n2 • Representation: – Adjacency List – Adjacency Matrix – (Packed Edge List) • Problems applicable to graphs: – Minimum spanning tree (Kruskal, Prim) – Shortest Path (Dijkstra) CSE 5350 - Fall 2007 Data Structures Slide 54 Website Navigation as a Graph CSE 5350 - Fall 2007 Data Structures Slide 55 Distance Graph Example CSE 5350 - Fall 2007 Data Structures Slide 56 Graph Representation CSE 5350 - Fall 2007 Data Structures Slide 57 Minimum Spanning Tree • Spanning Tree of a connected, undirected graph = some subset of the edges that connect all the nodes, and does not introduce a cycle CSE 5350 - Fall 2007 Data Structures Slide 58 Kruskal’s Algorithm CSE 5350 - Fall 2007 Data Structures Slide 59 Prim’s Algorithm CSE 5350 - Fall 2007 Data Structures Slide 60