* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CSE 373 - Data Structures - Dr. Manal Helal Moodle Site
Abstraction (computer science) wikipedia , lookup
Stream processing wikipedia , lookup
Array data structure wikipedia , lookup
Selection algorithm wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Data-intensive computing wikipedia , lookup
Multidimensional empirical mode decomposition wikipedia , lookup
AASTMT Engineering and Technology College CC 215 DATA STRUCTURES Lecture 1 Dr. Manal Helal - Fall 2014 1 Staff 2 Instructor Dr. Manal Helal, [email protected] TA’s Eng. Nour El Din S Eissa, [email protected] Fall 2014 CC 215 Data Structures 1-3 Course Description The course tackles the difference between static data type and dynamic data types. The concept of pointers & dynamic memory allocation is discussed allowing students to experience practical programming using dynamic structures. 1-4 Course Topics Introduction to static Vs dynamic data structures Stack data type Implementation of stack in different applications Queue data type Introduction to dynamic programming using pointers Linked lists Double & circular linked lists Introduction to tree structures Tree traversals Threaded tree Graphs representation and traversals Graphs minimum spanning tree & shortest path 1-5 Grading Scheme Week 7 5% 2.5% 2.5% 20% Week 12 Quizes Lab Submissions Assignments Midterm 1 Quizes Lab Submissions Assignments Midterm 2 Project Final Exam 5% 2.5% 2.5% 10% 10% 40% 1-6 Course Rules Website: http://moodle.manalhelal.com/course/view.php?id=4 Signup using First and Last Names exactly as your records in AASTMT, and your student ID as your student ID in AASTMT. Otherwise, your grades will not be transferred to your academy records. Login often to follow up with assignment deadlines and announcements. Its your responsibility to keep up with the course. Failure to receive emails or notifications is no excuse. Everything is announced in class and then published in moodle. Academic Honesty: Please confirm with the AASTMT policies regarding plagiarism and cheating. A zero grade for the violating submission will be given the first time, then reported to the department for further action. Please ask questions in the online forum to allow everyone to join and benefit from the discussions, and avoid private emails to the lecturers and TAs. Contact the teachers privately only about your personal grades or circumstances, not about the course content. 1-7 How to score A+ in this course? Please do all practicals and assignments and study regularly. In case of problems, please ask questions. Accumulating problems will make things worse as the semester goes by. Office Hours 8 Dr. Manal Helal – Room 308 Sun and Monday 2:00-04:00 p.m. or by appointment Eng. Nour – to be announced CC 215 Data Structures Fall 2014 Textbook 9 Data Structures and Algorithm Analysis in C++, by Weiss See Web page (syllabus) for errata and source code CC 215 Data Structures Fall 2014 Class Overview 10 • Introduction to many of the basic data structures used in computer software • • • • • • Understand the data structures Basically analyze the algorithms that use them (more in CC412 Algorithms course). Know when to apply them Practice design and analysis of data structures. Practice using these data structures by writing programs. Data structures are the plumbing and wiring of programs. CC 215 Data Structures Fall 2014 Goal 11 You will understand what the tools are for storing and processing common data types which tools are appropriate for which need So that you will be able to make good design choices as a developer, project manager, or system customer CC 215 Data Structures Fall 2014 Data Structures: What? 12 Need to organize program data according to problem being solved Abstract Data Type (ADT) - A data object and a set of operations for manipulating it List ADT with operations insert and delete Stack ADT with operations push and pop Note similarity to Java classes private data structure and public methods CC 215 Data Structures Fall 2014 Data Structures: Why? 13 Program design depends crucially on how data is structured for use by the program Implementation of some operations may become easier or harder Speed of program may dramatically decrease or increase Memory used may increase or decrease Debugging may be become easier or harder CC 215 Data Structures Fall 2014 Terminology 14 • Abstract Data Type (ADT) • • Algorithm • • A high level, language independent, description of a step-bystep process Data structure • • Mathematical description of an object with set of operations on the object. Useful building block. A specific family of algorithms for implementing an abstract data type. Implementation of data structure • A specific implementation in a specific language CC 215 Data Structures Fall 2014 Algorithm Analysis: Why? 15 Correctness: Does the algorithm do what is intended. Performance: What is the running time of the algorithm. How much storage does it consume. Different algorithms may correctly solve a given task Which CC 215 Data Structures should I use? Answered in CC412. Fall 2014 Proof by Induction 16 Basis Step: The algorithm is correct for the base case (e.g. n=0) by inspection. Inductive Hypothesis (n=k): Assume that the algorithm works correctly for the first k cases, for any k. Inductive Step (n=k+1): Given the hypothesis above, show that the k+1 case will be calculated correctly. CC 215 Data Structures Fall 2014 Program Correctness by Induction 17 Basis Step: sum(v,0) = 0. Inductive Hypothesis (n=k): Assume sum(v,k) correctly returns sum of first k elements of v, i.e. v[0]+v[1]+…+v[k-1] Inductive Step (n=k+1): sum(v,n) returns v[k]+sum(v,k) which is the sum of first k+1 elements of v. CC 215 Data Structures Fall 2014 Algorithm Execution-Time Analysis (Simplified) definitions. Best-Case Execution Time (BCET): The shortest time that the algorithm takes to solve the problem. Worst-Case Execution Time (WCET): The longest time that the algorithm takes to solve the problem. Average-Case Execution Time (ACET): The expected time that the algorithm takes to solve the problem on average. The average execution time is the arithmetic mean of all execution times if all inputs are equally likely to occur, otherwise we use probability distributions to compute it. We will not focus on probability distributions of execution times in this course, this is for your own knowledge. How do we compute the above? Not easy. Algorithm Execution-Time Analysis Sum the natural numbers 1 to n. Arithmetic series. We know that SUM = 1+2+...+n = n(n+1)/2. A1 Function SUM(n: ℕ) : result := n(n+1)/2 return result A2 Function SUM(n: ℕ) : ℕ result := 0 for i:=1 to n do result := result + i end for return result Which algorithm is faster A1 or A2? Note the use of pseudocode Pseudocode 20 In the lectures algorithms will be presented in pseudocode. This is very common in the computer science literature Pseudocode is usually easily translated to real code. This is programming language independent CC 215 Data Structures Fall 2014 Algorithm Execution-Time Analysis Algorithm A1 A1 computes the result in a constant number of steps at runtime. What is the execution time T(A1) for A1? T(A1) = β1 (β1 is some constant) T(A1) does not depend on the input. If input is 1, 10, 100, etc. T(A1) will not change (significantly). Algorithm A2 A2 computes the result in a variable number of steps at runtime depending on input size. What is the time T(A2) for A2? T(A2) = α2n + β2 (α2, β2 are some constants) T(A2) depends on the input size If input is 1, 10, 100, etc. T(A2) will change. Algorithm Execution-Time Analysis Before we start to formalise the asymptotic-complexity analysis of algorithms, we make sure of the following. We are interested in computing the growth function of an algorithm which shows how the number of steps of the algorithm varies in terms of the size of its input. On the Y axis: the number of runtime steps. On the X axis: the size of the input. The growth function of an algorithm has the form f(n). n is the size of the input to the algorithm (X axis). f(n) is the number of runtime steps to run the algorithm given input n (Y axis). The asymptotic complexity of an algorithm with growth function f(n) is O(g(n)) means that the asymptotic growth of f is bounded (from the top) by that of g. More in CC412 and in moodle website. Algorithm Execution-Time Analysis Which of A1 and A2 is faster? T(A1) = β1 T(A2) = α2n + β2 When is this the case? When is this the case? Algorithm Execution-Time Analysis We say that A1 has a constant growth. We say that A2 has a linear growth in terms of the input. Its execution time does not depend on the input value/size. T(A1) = β1 Its execution time grows as a linear function in terms of the input value/size. T(A2) = α2n + β2 A1 is more efficient than A2. When input is small, A2 might be more efficient. Analysis of algorithms when input is small is generally not interesting, as they spend most of the time in these cases in start-up code. When input is large, A1 is more efficient. We are interested in the cases when input is large, also called as the difficult instances. So A1 is more efficient than A2. Asymptotic-Complexity Classes Name Symbol Constant O(1) Logarithmic O(log n) Linear O(n) Log-linear O(n log n) Quadratic O(n2) Cubic O(n3) Polynomial O(np) Exponential O(bn) Factorial O(n!) Incomputable O(∞) Algorithms vs Programs 26 Proving correctness of an algorithm is very important a well designed algorithm is guaranteed to work correctly and its performance can be estimated Proving correctness of a program (an implementation) is fraught with weird bugs Abstract Data Types are a way to bridge the gap between mathematical algorithms and programs CC 215 Data Structures Fall 2014 Algorithms and Data • Algorithms operate on data. o An algorithm is executed on the processor. o The data is stored in main memory. • Main memory: o Organised into addressable cells/locations that store data. Arrays An array (data structure) is a collection of data items or elements (variables, values) which are identified by integer indices. All the elements in an array have the same type. Arrays and Records The memory address of each array’s cell can be computed from its index using a very simple mathematical formula. The address of the first cell of the array is: address(A[0]) = s. This is implementation dependent. The address of the ith cell is: In order to access an element of the array: The address of that element is computed in constant time. address(A[i])= s + i. This is true because a simple addition is performed. Array access is O(1). Records (also called Structures). Similar to arrays but they can store items of different types. Arrays and Records Arrays and records are contiguous data structures. Their elements are located next to each other in main memory. Advantages of using arrays and records. We can retrieve an array element from its index in constant time, O(1), meaning it costs us asymptotically nothing to look up an element of an array or record. Consist solely of data, no space wasted on links. This is very important. Compare to linked lists later. Physical continuity/memory locality: if we look up element i, there is a high probability we will look up element i+1 next – this is exploited by cache memory. Arrays and Records Disadvantages of using arrays and records. Static arrays are non-flexible static structures. We have to decide in advance how much space we want when the array is allocated. Insertion/deletion is expensive. Once the block of memory for the array has been allocated, that’s it – we’re stuck with the size we’ve got. Requires shifting. Re-copying of array is full. We can compensate by always allocating arrays larger than we think we’ll need, but this wastes a lot of space. Dynamic arrays grow/shrink in size at runtime so they are relatively flexible. E.g. ArrayList data structure in Java. Linked Lists Static arrays are used when we know a priori (i.e., before implementation) how many items we need to store. For example: Ask the user to input 20 numbers to sort. A static array is used. We know the user will input 20 numbers. Ask the user to enter some numbers to sort. A dynamic array is used. Not all programming languages have efficient support for dynamic arrays. Programming a dynamic array based dynamic resizing and copying of elements is not efficient. A linked-list is used. We don’t know how many numbers the user will input. We can always program it efficiently. Linked Lists Linked Lists Advantages of using linked lists: Very flexible. We don’t need to worry about allocating space in advance, can use any free space in memory. We only run out of space when the whole memory is actually full. When doing insertion and deletion no shifting is required. More efficient for moving large records (leave data in same place in memory, just change some pointers). Disadvantages of using linked lists: Wasted space: We store both pointers and data. To find the ith item, we must start at the beginning and follow pointers until we get there. In the worst case, if there are n items in a list and we want the last one, we have to do n lookups. So retrieving an element from its position in the list is O(n). Performance We are interested in knowing how costly it is to: Access an item in the data structure. Traverse the data structure (e.g., to search). Insert or delete an item from the data structure. Arrays Access is constant. Traversal is linear. Insertion/Deletion Requires shifting if array is not full. Requires copying in a bigger array if array is full. Linked lists. Access is linear. Traversal is linear. Insertion/Deletion are constant. Singly, Doubly, and Circularly Linked Lists The linked lists we have seen so far are called singly linked lists. A node points to its successor in the list. A node has one pointer only. There are also doubly linked lists. A node points both to its successor and its predecessor in the list. The node has two pointers. Consume more space but allow easier traversal for some algorithms. There are also circularly singly linked lists. The tail points back to head. There is no “real” head or tail. A special node is used called cursor e.g. to know where to start and finish traversing the circularly list. Example See the source code on moodle (now in Java, and will add more in C). A node of the list. Node.java A singly linked list. SinglyLinkedList.java An example usage of linked list. SinglyLinkedListExample.java An example usage of Java implementation of linked list. SinglyLinkedListExampleUsingJavaLibraries.java Concrete Data Structures and Abstract Data Structures/Types Concrete Data Structures Arrays, records, and linked lists are concrete data types. They are provided by the computer language. They are stored at specific addresses in memory. Abstract Data Structures/Types (ADTs) Offer a higher-level view of our interaction with data, and comprise Data. Operations on this data. We describe the behaviour of our data structures in terms of abstract operations. However, the way these operations are implemented will affect efficiency There are different implementations of the same abstract operations We want the ones we will use most commonly to be the most efficient Stacks Stack of books Stack of Plates Queues Please learn to queue in your everyday life, and teach/tell people around you to queue. A queue of people • Queuing is an important rule in life. • By not queuing: o You take other people’s rights. o You show you are not civilised. For Next Week: 41 Read Chapter 3: particularly about Lists and Stack ADT. Exercise: In Pseudo-code, swap two adjacent elements in a list using: An Array A Linked List by adjusting only the links (and not the data) for: a. singly linked lists b. doubly linked lists