Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Structures and Algorithms http://sist.sysu.edu.cn/~qiaohy/DS2012 Lectures • 3 hour lecture every Tuesday afternoon • 2 hour lab session every Tuesday afternoon • A new approach to DS: interactions, discussions, learning by doing. • Check course page regularly: http://sist.sysu.edu.cn/~qiaohy/DS2012/ • Important to attend both the lectures and lab sessions, and to finish assignments on time. References • Text book: “Data Structures and Algorithm Analysis in C++”, 3rd edition, Mark A. Weiss. • References: – The art of computer programming, V.1, V. 3, Donald Knuth. – Data Structures and Program Design in C++, Robert L. Kruse and Alexander J. Ryba – Introduction to Algorithms, Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rovest and Clifford Stein. – 数据结构,严蔚敏,吴伟民,清华大学出版社。 – Tons of learning materials on the internet. Grading Scheme • Grading is based on – Home work (online exercises, written assignments or programming assignments) and attendance 20% – Midterm examination 30 % – Final examination 50% – Bonus: active participation • Scores are given for both “theories” and “labs”. Assignments • Everything will be available online, and check course page regularly • Written assignments and on-line exercises(Trakla2) – Due by time specified – Finished independently • Programming assignments – Due by time specified – Run on PC – group collaboration of 2 people is encouraged both for the programming assignments and lab sessions. Plagiarism Policy • st1st Time: both get 0 •1 : both get negative; 2nd Time: need to terminate •• cannot be trusted; Midterm Final: FAIL •• used to beorfailed in an theautomatic final. You are encouraged to collaborate in study groups. But, you cannot copy or slightly change other students’ solutions or codes. Course Overview • A fundamental computer science course - Essential for programming - Essential for advanced courses • A challenging course, which concerns problem solving by programming and needs – Programming – Mathematical thinking and Algorithmic thinking. Why Data Structures Matter? • Problem: Write a program which can search a person’s telephone number from a pile of name cards? From problem space to solution space (左建卫,84458290) (Zuo jianwei, 84458290) (Liu Ning, 84138355) … (Gao Min, 39943222) From problem space to solution space Abstraction: • Represent physical items by data, for example, a card is represented by its name and phone number (“Gao Ming”, “40088888”). • A pile of cards -> a set of data of the form (name, phone number); • Now the problem is: given a set of the pair (name, phone number) and a name target, which you want to look up, for example, “Gao Ming”, is there a pair whose first component is target? If yes, return its second component. Otherwise, fail. From Problem Space to Solution Space Solutions: think about how you do it in practice • We want to organize the set of data, for example, organize them into a sequence, and then you look at each data one by one. • We say that the set of data has a linear logical structure (Wang Bingbing, 98876544) (Zuo jianwei, 84458290) (Liu Ning, 84138355) … (Gao Min, 39943222) From Problem Space to Solution Space Design the Algorithm, which is a list of well-defined instructions that solve a problem. • How to represent data? Use data type. • How to represent each pair (name, phone number)? – Use record type • How to represent the set of pairs, or how to store the set of data? – Using array, for example, • The algorithm: given an array of records and the target, compares every record in the array with target until the name of the record equals to target. • Implement the algorithm, or write the program. Solution 1 • Input is a (linear) sequence of records and a target • Storage: using arrays • Algorithm: sequential search C++: struct Card { string name; string phone; }; int sequentialSearch(Card st[], int n, const string &target) { /*search a record whose name is target and returns the position of the record if search is successful, returns n otherwise. */ int i; for (i = 0; i<n && st[i].name != target;i++); return i; } Evaluation of the algorithm • Is the algorithm correct? – Proving • Is the implementation correct? – Testing • How efficient is the search algorithm? – Algorithm analysis Solution 2 • Input is a (linear) sequence of records and a target • Storage: using linked lists • Algorithm: sequential search 用C++表示: struct Node{ Card data; Node * next; }; Node* sequentialSearchLinked(Node *head, const string &target) { /*search a record whose name is target and returns the pointer to the record if search is successful, returns NULL otherwise. */ Node *p=head; while (p!=NULL&&(p->data).name!=target) { p=p->next; }; return p; Solution 3 & 4 • Using binary search: input is ordered by name; storage using arrays and binary search; • Using binary search trees: input is organised as a ordered tree, tree is stored using linked representation and using binary tree search. struct TreeNode{ Card data; TreeNode *llink,*rlink; }; Solution 5 • Using ADT (abstract data type) map int main() { map<const char*, string,ltstr> phones; //initialize a map container phones[“Dai Yun"] = “34035188”; //insert a record phones[“Liang Hao"] = “84133365”; phones["Zhang Hen"] = “34567890”; phones[“Wang Ming"] = “66908143”; phones[“Luo Feng"] = “84134566”; map<const char*, string, ltstr>::iterator cur = phones.find("Zhang Min"); if (cur != phones.end()) //cur points to the end of the map and to the record othereise. cout << "Zhang Min's phone is " << (*cur).second << endl; else cout << "Zhang Min doesn't exist in phones." <<endl; } From Problem Space to Solution Space • Abstraction: what are the input data, the operations and output data? • One needs to organise the data or design the logic structure to be able realize the operations; • The data and operations can be packed into an ADT(Abstract Data Type); • Decide how to store the data or design physical structures of the data; • Design the algorithm; • Evaluate the solution: running time and storage; • Some solutions are much better than others. Some takes O(log n) running time while other take O(n). • Implement the algorithm, and test it again its specification. Course Outline • C++ review • Algorithmic asymptotic analysis – Big-Oh, big-Theta, and big-Omega • ADTs: Lists, stacks, and queues • Sorting – Insertion, mergesort, quicksort, heapsort, radix, etc. – Lowest bound on sorting • Search trees – Binary search tree – AVL tree – B- tree • Hashing • Graphs – Breadth-first search and Depth-first search – Shortest paths – Minimal spanning trees 19 Overall Goals of the Course • Learn to solve problems by programming • Learn to analyze your solutions • From programmer to architect C++ reviews: Pointers and References • Pointers – Pointers store addresses and aim to realize dynamic memory allocation; – Use new operator to allocate memory and delete to deallocate it. Make sure no memory leak. – Problem: if two pointers point to the same address and the memory is reclaimed by deleting one pointer, then the other pointer is not valid any more. – Used to change values in C. For example, void swap(int *p, int *q). In C++ one can also use pass by reference which is more efficient. • References – A reference is a synonym for a variable and always refers to the same variable. It must be initialized when it is declared. – Used in parameter passing for output or for efficiency (const references). Parameter Passing • When parameters are passed by value (call by value), the actual parameters are copied. Not efficient for large data. • When parameters are to be changed, pass by reference can be used instead. • When parameters are not to be changed, pass by const reference can be used. • Example: double avg(const vector<int>& arr, int n, bool & errorFlag); /*returns the average of first n numbers in arr, errorFlag is true if successful, false otherwise. */ – The first one uses pass by const reference, because it is a big data and should not be changed in the function, the second one is a primitive type and uses pass by value, and the third one uses pass by reference because it outputs some information. The Big Three • The destructor is called whenever an object goes out of scope or is subjected to a delete to free up any resources that were allocated during the use of the object. • The copy constructor is called when an object is initialized to a copy of some object. – IntCell b =c; //or IntCell b( c), but not b=c. – Parameter passing by value foo(IntCell a); – An object returned by value. • The assignment operator = is called when = is applied to two objects which have been constructed previously. Lhs=rhs is to copy the state of rhs into lhs. • The big three, destructor, copy constructor and operator =, are provided by the compiler when they are not provided by the class. The default applies destructor for each data member. The default applies copy constructor to each data member. The default operator = is implemented by applying operator = to each data member. Problems with the big three class IntCell { public: explicit IntCell( int initialValue = 0 ); int read( ) const; void write( int x ); private: int *storedValue; }; IntCell::IntCell( int initialValue =0) { storedValue = new int( initialValue ); } What is missing there? What will be the effect? See Fig.1.15. Problems with the big three int foo(){ IntCell vitalData; for (int i=1;i<100000;i++){ IntCell localV;//local object localV = vitalData; //default ‘=‘ is used } /* Two problems: 1)memory leak by localV if default destructor is applied because default destructor doesn’t release the memory taken by localV’s member storedValue; 2) destruction of localV results in vitalData destroyed if user-defined destructor is applied, which releases memory shared by localV’s member and vitalData’s member. */ Problems with the big three void destroyData(IntCell copy){//see how actual parameter is passed. } destroyData( vitalData); /* default copy used, when the copy is destroyed (if there is a user-defined destructor), memory allocated to the copy is reclaimed although vitalData is still alive. */ Problems with the big three • Default are not adequate when a class has a data member which is a pointer. • Memory allocated to b remains allocated because the default destructor does nothing for storedPointer and results in memory leak. • The default copy constructor and operator = simply copy the value of the pointer (shallow copy, shallow semantics, reference semantics). The copy shares the same IntCell object with the original object. When the original is changed or destructed, the result is not predictable. • What we need is a deep copy, that is, we want to copy the object, not the reference to the object. IntCell with the big-three class IntCell { public: explicit IntCell( int initialValue = 0 ); IntCell( const IntCell & rhs ); ~IntCell( ); const IntCell & operator=( const IntCell & rhs ); int read( ) const; void write( int x ); private: int *storedValue; }; Deep copy is needed IntCell::IntCell(const IntCell &rhs){ storedValue = new int(*rhs.storedValue); } IntCell::~IntCell(){ delete storedValue; } const IntCell & IntCell::operator=(const IntCell &rhs){ if(this !=&rhs) *storedValue = *rhs.storedValue; return *this; } Reading and assignments • Weiss chapter 1 • Understand the following terminologies: memory leak, pass by value, pass by reference, shallow copy, shallow semantics, deep copy, deep semantics, automatic objects, dynamic objects, accessors, mutators, algorithms. • Try to answer the following questions: – What are the differences between points and references? – When the three parameter passing mechanisms are used? – When default copy constructors are not applicable? – Where the modifier const should be used? • Exercises: 1.7a, 1.11a.b, 1.12 a