Download Introduction - FSU Computer Science

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia, lookup

Open Database Connectivity wikipedia, lookup

Microsoft Jet Database Engine wikipedia, lookup

Entity–attribute–value model wikipedia, lookup

Concurrency control wikipedia, lookup

Database wikipedia, lookup

Functional Database Model wikipedia, lookup

Clusterpoint wikipedia, lookup

Relational model wikipedia, lookup

Database model wikipedia, lookup

Advanced Database Systems
Spring 2017
Tallahassee, Florida, 2017
Welcome to COP5725!
• COP5725: Advanced Database Systems
– Course website: all you need to know about COP5725
– Time:
2pm--3:15pm Mondays and Wednesdays
– Venue:
LOV 103
• Please go over the syllabus carefully before taking the
Welcome to COP5725!
• Instructor
– Prof. Peixiang Zhao
– Office hours:
• Monday, Wednesday: 3:30pm-4:30pm
• Or by appointment
– Office: LOV 262
– Research interest:
• Database, data mining, information/social network and graph analysis
• TA
– Dr. Yongjiang Liang
– Office hours: Thursday 1:30pm – 2:30pm
– Office: MCH 106-A
The Goal of COP5725!
1. Reflection of the foundation:
– Climb up to the shoulders
– the foundational models, representations, systems, and techniques
for relational database systems, by way of reading and lectures
2. Projection on the outlook:
– And look out from here! Be inspired
– what’s the next advanced database systems?
– by way of reading and presenting the classics and the state-of-theart, and by way of doing projects!
• “We can do it!”
The Contents of COP5725!
• Relational Database Internals
Fundamentals for relational databases
Data storage and representation
Advanced indexing
Query processing and execution
Query optimization
• Advanced Database Topics
Parallel/Distributed databases (MapReduce)
Data mining (selected topics)
Data on the Web
Welcome to COP5725!
• Textbook
– Database Systems: The Complete Book 2nd edition
– Hector Garcia-Molina, Jeff Ullman and Jennifer Widom
• Recommended reading
– Database Management Systems 3rd edition, by Raghu Ramakrishnan
and Johannes Gehrke
– Readings in Database Systems 5th edition, by P. Bailis J. Hellerstein
and M. Stonebraker (
– The Web
• Prerequisites
– COP4710: Introduction to Database Systems
– COP4530: Data Structures and Algorithms
– Good programming skills
Welcome to COP5725!
• Components of the course
1. Two lectures every week (?)
2. Two assignments (10%)
3. A series of papers to be read and summarized (15%)
One or two-page paper summary to be submitted during the
class on the due date
4. Paper presentation (5%)
Every group will present one paper related to the project in the
class for 15(?) minutes
5. Semester-long project (30%)
6. A set of quizzes (5%)
7. Final exam (35%)
Paper Summaries
• Milestone papers in database systems
• Every paper will be assigned early in the course website, and can
be downloaded within the campus network
• One to two pages summary includes
– What is the problem?
– Why is this problem important and worthy of a thorough study?
– Why is this problem difficult?
– What are the innovative ideas and technical merits?
– Comments on the experimental evaluations
– Any drawbacks and potential improvement?
• Summarize based on your own understanding. Verbatim copying
from the paper results in low scores
• Contents in the paper will be tested in the final exam!
Paper Presentation
• Every group will have a chance to select one paper to present in
the class
– The paper should be closely related to the project you are conducting
– The slides (pptx/ppt/pdf) should be sent to the instructor at least one day
prior to the class you will be presenting
– The slides organization should be similar to the requirement of the paper
– 15(?) minutes presentation and Q&A
• Student will sign up for the presentation in the near future
• Theme: choose either of the two
1. Research-flavor: mainly for Ph.D. students
find an interesting, nontrivial data management problem, propose a
novel and effective solution to it
2. Implementation-flavor: mainly for M.S. students
find interesting methods/algorithms in a data management paper,
implement it, and perform experimental studies
Teamwork: a group of one or two students (but no more!)
The project is partitioned into multiple milestones, each of
which requires deliverables
Pay attention to the workload!
Multi-stage Project
1. Group formation (0%)
2. Project Proposal (10%)
What I want to do?
3. Literature Survey (20%)
What are the state-of-the-art?
4. Status report (10%)
– What I have achieved thus far
5. Source code, software and final report (60%)
Dude, these are my deliverables!
Implementation Project
• Topics:
– Choose a research paper published in the following conferences/journals after
2002, implement the idea and finish all experimental studies related to this idea
– Journals: TODS, VLDB Journal, TKDD, TKDE
Workload (in C/C++ or Java)
3000-5000 lines of code; real/synthetic data, experimental studies
Source code, software, detailed readmes and scripts, and a final report
Repeatability, Completeness of datasets and experimental studies, Efficiency,
Effectiveness, Scalability ……
You may demo your implementation to TA
Research Project
• Topics:
– A state-of-the-art data management, mining problem in your research area
• Workload
Problem definition, algorithm design and analysis, implementation (more
than 3000 lines of code, in C/C++ or Java), experimental studies
Your innovative ideas!
A conference-quality (potential publishable) paper
Source code, software, detailed readmes and scripts
You may demo your implementation to TA
• The first quiz will be held on Wednesday 01/11
– Takes up 3% of your full credit!
– Coverage:
• Fundamentals in relational DB
• Data structures and algorithms
• Remaining quizzes will be held throughout the
– Call for attendance
– Get feedbacks and suggestions from students
Is This Course Suitable For Me?
• First-day Attendance Policy at FSU
• Prerequisites MUST be satisfied
– Introduction to database systems
• Relational model, relational algebra, relational design, SQL, B/B+
tree, hashing, transaction management, crash recovery……
– Data structures and algorithms
Difference between stack and queue?
Worst-case complexity for insertion/deletion in Red-black trees?
Dijkstra algorithm for shortest-path computation
Set-cover is NP-complete
• Feel comfortable in programing (a lot)
COP5725 =
How DB Knowledge is created + How to create more
• In terms of topics, COP5725 is not:
– about Linux + Apache + PHP + MySQL (LAMP)
– about designing DBs that are in BCNF
– about SQL3 and stored procedures
– about Oracle tuning and implementation
• In terms of methodology, COP5725 is not solely
– by reading textbook and acing it
– by implementing a well-specified DB algorithm, e.g., B+tree
How to Get the Most out of COP5725?
• Read and think before class
– read the textbooks for related concepts
– read the papers
• Use lectures as road map for studying
– Lecture notes won’t cover all the material
• Use your peers in learning
– discuss in/out of classes to enhance understanding
• Explore interesting projects creatively
– learning by doing
Any questions so far?
Evolution of Data Management
• Jim Gray: Evolution of Data Management. IEEE
Computer 29(10): 38-46 (1996)
Prehistory Thoughts: Emergence of the Notion of DBMS
• William C. McGee: Generalization: Key to Successful
Electronic Data Processing. J. ACM 6(1): 1-23 (1959)
• When data processing was mostly ad-hoc programs --Need generalization, e.g.,
– sorting
– file maintenance
– data access
– modification and update
– report generation
– ……
How Did We Get Here?
• The dominating relational database system, which
we take for granted now, was deemed impossible to
implement and difficult to use in its early days
• But-- Quoting Jim Gray:
These innovations give one of the best examples of research prototypes
turning into products. The relational model, parallel database systems, active
databases, and object-relational databases all came from the academic and
industrial research labs. The development of database technology has
been a textbook case of successful collaboration between academy
and industry.
-- Evolution of Data Management
In Industry
In Science – Turing Awardees
The Grand Challenges of Data Management
• Relational DBMS was invented in early 70’s, and now
50+ billion mature industry
• What are we still working on? Big Data!
• What is the ultimately advanced DB?
– Data of all sorts--- Prevalent on the Web!
– What have you been searching lately?
– What you search is what you want?
• New challenges naturally arise
– structured vs. unstructured data
– querying vs. analysis vs. mining vs. learning
– closed “base” vs. the open Web
Have fun!
What Does 'Big Data' Mean and Who Will Win?
Tallahassee, Florida, 2017