* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction - FSU Computer Science
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Clusterpoint wikipedia , lookup
COP5725 Advanced Database Systems Spring 2017 Introduction Tallahassee, Florida, 2017 Welcome to COP5725! • COP5725: Advanced Database Systems – Course website: all you need to know about COP5725 http://www.cs.fsu.edu/~zhao/cop5725/main.html – Time: 2pm--3:15pm Mondays and Wednesdays – Venue: LOV 103 • Please go over the syllabus carefully before taking the class! 1 Welcome to COP5725! • Instructor – Prof. Peixiang Zhao http://www.cs.fsu.edu/~zhao – Office hours: • Monday, Wednesday: 3:30pm-4:30pm • Or by appointment – Office: LOV 262 – Research interest: • Database, data mining, information/social network and graph analysis • TA – Dr. Yongjiang Liang – Office hours: Thursday 1:30pm – 2:30pm – Office: MCH 106-A 2 The Goal of COP5725! 1. Reflection of the foundation: – Climb up to the shoulders – the foundational models, representations, systems, and techniques for relational database systems, by way of reading and lectures 2. Projection on the outlook: – And look out from here! Be inspired – what’s the next advanced database systems? – by way of reading and presenting the classics and the state-of-theart, and by way of doing projects! • “We can do it!” 3 The Contents of COP5725! • Relational Database Internals – – – – – – Fundamentals for relational databases Data storage and representation Advanced indexing Query processing and execution Query optimization …… • Advanced Database Topics – – – – Parallel/Distributed databases (MapReduce) Data mining (selected topics) Data on the Web …… 4 Welcome to COP5725! • Textbook – Database Systems: The Complete Book 2nd edition – Hector Garcia-Molina, Jeff Ullman and Jennifer Widom • Recommended reading – Database Management Systems 3rd edition, by Raghu Ramakrishnan and Johannes Gehrke – Readings in Database Systems 5th edition, by P. Bailis J. Hellerstein and M. Stonebraker (http://www.redbook.io) – The Web • Prerequisites – COP4710: Introduction to Database Systems – COP4530: Data Structures and Algorithms – Good programming skills 5 Welcome to COP5725! • Components of the course 1. Two lectures every week (?) 2. Two assignments (10%) 3. A series of papers to be read and summarized (15%) • One or two-page paper summary to be submitted during the class on the due date 4. Paper presentation (5%) • Every group will present one paper related to the project in the class for 15(?) minutes 5. Semester-long project (30%) • • Research-flavor Implementation-flavor 6. A set of quizzes (5%) 7. Final exam (35%) 6 Paper Summaries • Milestone papers in database systems • Every paper will be assigned early in the course website, and can be downloaded within the campus network • One to two pages summary includes – What is the problem? – Why is this problem important and worthy of a thorough study? – Why is this problem difficult? – What are the innovative ideas and technical merits? – Comments on the experimental evaluations – Any drawbacks and potential improvement? • Summarize based on your own understanding. Verbatim copying from the paper results in low scores • Contents in the paper will be tested in the final exam! 7 Paper Presentation • Every group will have a chance to select one paper to present in the class – The paper should be closely related to the project you are conducting – The slides (pptx/ppt/pdf) should be sent to the instructor at least one day prior to the class you will be presenting – The slides organization should be similar to the requirement of the paper summary – 15(?) minutes presentation and Q&A • Student will sign up for the presentation in the near future 8 Project • Theme: choose either of the two 1. Research-flavor: mainly for Ph.D. students • find an interesting, nontrivial data management problem, propose a novel and effective solution to it 2. Implementation-flavor: mainly for M.S. students • find interesting methods/algorithms in a data management paper, implement it, and perform experimental studies • Teamwork: a group of one or two students (but no more!) • The project is partitioned into multiple milestones, each of which requires deliverables • Pay attention to the workload! 9 Multi-stage Project 1. Group formation (0%) 2. Project Proposal (10%) – What I want to do? 3. Literature Survey (20%) – What are the state-of-the-art? 4. Status report (10%) – What I have achieved thus far 5. Source code, software and final report (60%) – Dude, these are my deliverables! 10 Implementation Project • Topics: – Choose a research paper published in the following conferences/journals after 2002, implement the idea and finish all experimental studies related to this idea – Conferences: SIGMOD, VLDB, ICDE, KDD, ICDM, SDM, SIGIR, WWW, CIKM – Journals: TODS, VLDB Journal, TKDD, TKDE • Workload (in C/C++ or Java) – • 3000-5000 lines of code; real/synthetic data, experimental studies Expectation – Source code, software, detailed readmes and scripts, and a final report • Repeatability, Completeness of datasets and experimental studies, Efficiency, Effectiveness, Scalability …… • You may demo your implementation to TA 11 Research Project • Topics: – A state-of-the-art data management, mining problem in your research area • Workload • – Problem definition, algorithm design and analysis, implementation (more than 3000 lines of code, in C/C++ or Java), experimental studies – Your innovative ideas! Expectation – A conference-quality (potential publishable) paper – Source code, software, detailed readmes and scripts – You may demo your implementation to TA 12 Quizzes • The first quiz will be held on Wednesday 01/11 – Takes up 3% of your full credit! – Coverage: • Fundamentals in relational DB • Data structures and algorithms • Remaining quizzes will be held throughout the semester – Call for attendance – Get feedbacks and suggestions from students 13 Is This Course Suitable For Me? • First-day Attendance Policy at FSU • Prerequisites MUST be satisfied – Introduction to database systems • Relational model, relational algebra, relational design, SQL, B/B+ tree, hashing, transaction management, crash recovery…… – Data structures and algorithms • • • • • Difference between stack and queue? Worst-case complexity for insertion/deletion in Red-black trees? Dijkstra algorithm for shortest-path computation Set-cover is NP-complete ……. • Feel comfortable in programing (a lot) 14 COP5725 = How DB Knowledge is created + How to create more • In terms of topics, COP5725 is not: – about Linux + Apache + PHP + MySQL (LAMP) – about designing DBs that are in BCNF – about SQL3 and stored procedures – about Oracle tuning and implementation • In terms of methodology, COP5725 is not solely – by reading textbook and acing it – by implementing a well-specified DB algorithm, e.g., B+tree 15 How to Get the Most out of COP5725? • Read and think before class – read the textbooks for related concepts – read the papers • Use lectures as road map for studying – Lecture notes won’t cover all the material • Use your peers in learning – discuss in/out of classes to enhance understanding • Explore interesting projects creatively – learning by doing 16 Any questions so far? 17 Evolution of Data Management • Jim Gray: Evolution of Data Management. IEEE Computer 29(10): 38-46 (1996) 18 Prehistory Thoughts: Emergence of the Notion of DBMS • William C. McGee: Generalization: Key to Successful Electronic Data Processing. J. ACM 6(1): 1-23 (1959) • When data processing was mostly ad-hoc programs --Need generalization, e.g., – sorting – file maintenance – data access – modification and update – report generation – …… 19 How Did We Get Here? • The dominating relational database system, which we take for granted now, was deemed impossible to implement and difficult to use in its early days • But-- Quoting Jim Gray: These innovations give one of the best examples of research prototypes turning into products. The relational model, parallel database systems, active databases, and object-relational databases all came from the academic and industrial research labs. The development of database technology has been a textbook case of successful collaboration between academy and industry. -- Evolution of Data Management 20 Examples 21 In Industry 22 In Science – Turing Awardees CHARLES BACHMAN, 1973 JAMES GRAY, 1998 EDGAR CODD, 1981 MICHAEL STONEBRAKER, 2014 23 The Grand Challenges of Data Management • Relational DBMS was invented in early 70’s, and now 50+ billion mature industry • What are we still working on? Big Data! – https://www.youtube.com/watch?v=vbb-AjiXyh0 – http://www.youtube.com/watch?v=LrNlZ7-SMPk • What is the ultimately advanced DB? – Data of all sorts--- Prevalent on the Web! – What have you been searching lately? – What you search is what you want? • New challenges naturally arise – structured vs. unstructured data – querying vs. analysis vs. mining vs. learning – closed “base” vs. the open Web 24 Have fun! What Does 'Big Data' Mean and Who Will Win? http://research.microsoft.com/apps/video/default.aspx?id=258302&l=i Tallahassee, Florida, 2017