Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Extensible Storage Engine wikipedia, lookup
Open Database Connectivity wikipedia, lookup
Microsoft Jet Database Engine wikipedia, lookup
Entity–attribute–value model wikipedia, lookup
Concurrency control wikipedia, lookup
Functional Database Model wikipedia, lookup
Clusterpoint wikipedia, lookup
COP5725 Advanced Database Systems Spring 2017 Introduction Tallahassee, Florida, 2017 Welcome to COP5725! • COP5725: Advanced Database Systems – Course website: all you need to know about COP5725 http://www.cs.fsu.edu/~zhao/cop5725/main.html – Time: 2pm--3:15pm Mondays and Wednesdays – Venue: LOV 103 • Please go over the syllabus carefully before taking the class! 1 Welcome to COP5725! • Instructor – Prof. Peixiang Zhao http://www.cs.fsu.edu/~zhao – Office hours: • Monday, Wednesday: 3:30pm-4:30pm • Or by appointment – Office: LOV 262 – Research interest: • Database, data mining, information/social network and graph analysis • TA – Dr. Yongjiang Liang – Office hours: Thursday 1:30pm – 2:30pm – Office: MCH 106-A 2 The Goal of COP5725! 1. Reflection of the foundation: – Climb up to the shoulders – the foundational models, representations, systems, and techniques for relational database systems, by way of reading and lectures 2. Projection on the outlook: – And look out from here! Be inspired – what’s the next advanced database systems? – by way of reading and presenting the classics and the state-of-theart, and by way of doing projects! • “We can do it!” 3 The Contents of COP5725! • Relational Database Internals – – – – – – Fundamentals for relational databases Data storage and representation Advanced indexing Query processing and execution Query optimization …… • Advanced Database Topics – – – – Parallel/Distributed databases (MapReduce) Data mining (selected topics) Data on the Web …… 4 Welcome to COP5725! • Textbook – Database Systems: The Complete Book 2nd edition – Hector Garcia-Molina, Jeff Ullman and Jennifer Widom • Recommended reading – Database Management Systems 3rd edition, by Raghu Ramakrishnan and Johannes Gehrke – Readings in Database Systems 5th edition, by P. Bailis J. Hellerstein and M. Stonebraker (http://www.redbook.io) – The Web • Prerequisites – COP4710: Introduction to Database Systems – COP4530: Data Structures and Algorithms – Good programming skills 5 Welcome to COP5725! • Components of the course 1. Two lectures every week (?) 2. Two assignments (10%) 3. A series of papers to be read and summarized (15%) • One or two-page paper summary to be submitted during the class on the due date 4. Paper presentation (5%) • Every group will present one paper related to the project in the class for 15(?) minutes 5. Semester-long project (30%) • • Research-flavor Implementation-flavor 6. A set of quizzes (5%) 7. Final exam (35%) 6 Paper Summaries • Milestone papers in database systems • Every paper will be assigned early in the course website, and can be downloaded within the campus network • One to two pages summary includes – What is the problem? – Why is this problem important and worthy of a thorough study? – Why is this problem difficult? – What are the innovative ideas and technical merits? – Comments on the experimental evaluations – Any drawbacks and potential improvement? • Summarize based on your own understanding. Verbatim copying from the paper results in low scores • Contents in the paper will be tested in the final exam! 7 Paper Presentation • Every group will have a chance to select one paper to present in the class – The paper should be closely related to the project you are conducting – The slides (pptx/ppt/pdf) should be sent to the instructor at least one day prior to the class you will be presenting – The slides organization should be similar to the requirement of the paper summary – 15(?) minutes presentation and Q&A • Student will sign up for the presentation in the near future 8 Project • Theme: choose either of the two 1. Research-flavor: mainly for Ph.D. students • find an interesting, nontrivial data management problem, propose a novel and effective solution to it 2. Implementation-flavor: mainly for M.S. students • find interesting methods/algorithms in a data management paper, implement it, and perform experimental studies • Teamwork: a group of one or two students (but no more!) • The project is partitioned into multiple milestones, each of which requires deliverables • Pay attention to the workload! 9 Multi-stage Project 1. Group formation (0%) 2. Project Proposal (10%) – What I want to do? 3. Literature Survey (20%) – What are the state-of-the-art? 4. Status report (10%) – What I have achieved thus far 5. Source code, software and final report (60%) – Dude, these are my deliverables! 10 Implementation Project • Topics: – Choose a research paper published in the following conferences/journals after 2002, implement the idea and finish all experimental studies related to this idea – Conferences: SIGMOD, VLDB, ICDE, KDD, ICDM, SDM, SIGIR, WWW, CIKM – Journals: TODS, VLDB Journal, TKDD, TKDE • Workload (in C/C++ or Java) – • 3000-5000 lines of code; real/synthetic data, experimental studies Expectation – Source code, software, detailed readmes and scripts, and a final report • Repeatability, Completeness of datasets and experimental studies, Efficiency, Effectiveness, Scalability …… • You may demo your implementation to TA 11 Research Project • Topics: – A state-of-the-art data management, mining problem in your research area • Workload • – Problem definition, algorithm design and analysis, implementation (more than 3000 lines of code, in C/C++ or Java), experimental studies – Your innovative ideas! Expectation – A conference-quality (potential publishable) paper – Source code, software, detailed readmes and scripts – You may demo your implementation to TA 12 Quizzes • The first quiz will be held on Wednesday 01/11 – Takes up 3% of your full credit! – Coverage: • Fundamentals in relational DB • Data structures and algorithms • Remaining quizzes will be held throughout the semester – Call for attendance – Get feedbacks and suggestions from students 13 Is This Course Suitable For Me? • First-day Attendance Policy at FSU • Prerequisites MUST be satisfied – Introduction to database systems • Relational model, relational algebra, relational design, SQL, B/B+ tree, hashing, transaction management, crash recovery…… – Data structures and algorithms • • • • • Difference between stack and queue? Worst-case complexity for insertion/deletion in Red-black trees? Dijkstra algorithm for shortest-path computation Set-cover is NP-complete ……. • Feel comfortable in programing (a lot) 14 COP5725 = How DB Knowledge is created + How to create more • In terms of topics, COP5725 is not: – about Linux + Apache + PHP + MySQL (LAMP) – about designing DBs that are in BCNF – about SQL3 and stored procedures – about Oracle tuning and implementation • In terms of methodology, COP5725 is not solely – by reading textbook and acing it – by implementing a well-specified DB algorithm, e.g., B+tree 15 How to Get the Most out of COP5725? • Read and think before class – read the textbooks for related concepts – read the papers • Use lectures as road map for studying – Lecture notes won’t cover all the material • Use your peers in learning – discuss in/out of classes to enhance understanding • Explore interesting projects creatively – learning by doing 16 Any questions so far? 17 Evolution of Data Management • Jim Gray: Evolution of Data Management. IEEE Computer 29(10): 38-46 (1996) 18 Prehistory Thoughts: Emergence of the Notion of DBMS • William C. McGee: Generalization: Key to Successful Electronic Data Processing. J. ACM 6(1): 1-23 (1959) • When data processing was mostly ad-hoc programs --Need generalization, e.g., – sorting – file maintenance – data access – modification and update – report generation – …… 19 How Did We Get Here? • The dominating relational database system, which we take for granted now, was deemed impossible to implement and difficult to use in its early days • But-- Quoting Jim Gray: These innovations give one of the best examples of research prototypes turning into products. The relational model, parallel database systems, active databases, and object-relational databases all came from the academic and industrial research labs. The development of database technology has been a textbook case of successful collaboration between academy and industry. -- Evolution of Data Management 20 Examples 21 In Industry 22 In Science – Turing Awardees CHARLES BACHMAN, 1973 JAMES GRAY, 1998 EDGAR CODD, 1981 MICHAEL STONEBRAKER, 2014 23 The Grand Challenges of Data Management • Relational DBMS was invented in early 70’s, and now 50+ billion mature industry • What are we still working on? Big Data! – https://www.youtube.com/watch?v=vbb-AjiXyh0 – http://www.youtube.com/watch?v=LrNlZ7-SMPk • What is the ultimately advanced DB? – Data of all sorts--- Prevalent on the Web! – What have you been searching lately? – What you search is what you want? • New challenges naturally arise – structured vs. unstructured data – querying vs. analysis vs. mining vs. learning – closed “base” vs. the open Web 24 Have fun! What Does 'Big Data' Mean and Who Will Win? http://research.microsoft.com/apps/video/default.aspx?id=258302&l=i Tallahassee, Florida, 2017