Download CS 440: Database Management Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Big data wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
CS 440
Database Management Systems
Course overview
Welcome to CS440!
•
•
•
•
Instructor: Arash Termehchy
Assistant Professor at EECS
Research on data management and analytics
Information & Data Management and Analytics
(IDEA) Lab
The Era of Big Data
• Technological shifts, e.g.,
mobile devices, have created a
staggering number of
enormous data sets.
• Both opportunities and
challenges.
Opportunities: unreasonable effectiveness of data
• A. Halevy, et al. The unreasonable effectiveness of data,
IEEE Intelligence Systems, 2009.
• Observation from working with large datasets in Google.
– More data generally outperforms complex statistical
models in the data-centric prediction and discovery.
• Conclusion:
– Usually, no need for overly complex statistical models.
Opportunities are priceless!
The story of John Snow
“In the mid-1850s, Dr. John Snow plotted cholera deaths on a
map, and in the corner of a particularly hard-hit buildings was a
water pump. A 19th-century version of Big Data, which suggested
an association between cholera and the water pump.”
Integrating data sets has saved millions of lives!
Paradigm shifting influence on scientific
discovery
• “The Fourth Paradigm: Data-Intensive Scientific Discovery”,
Jim Gray
– Empirical
– Theoretical
– Computational
– Data-centric
• Sloan Sky Server database is a top
cited resource in the field of astronomy.
– Astronomical observation => database query
Challenges: data volume
• Sloan Sky Server will soon store 30 terabyte per day.
• Hardon Colider can generate 500 exabyte per day.
• 90% of world data generated in the last two years (2013)
– Every two year : ten times more data
Challenges: data variety/ diversity
• Database systems used to deal with
a single static database.
• Need to transform and
or integrate large number
of evolving data sets.
• Impossible to do manually.
“A data integration expert is never without a job”
Challenges: usability
“….(in the next few years) we project a need for 1.5 million
additional analysts in the United States who can analyze data
effectively…“,
-- McKinsey Big Data Study, 2012
Current systems are not built for scientists and normal
users.
“It may take a PhD in computer science to successfully
deploy a data analytics algorithm!”
The notion of database management system (DBMS)
• Data processing used to be mostly ad-hoc programming.
• W. McGee, Generalization: Key to Successful Electronic
Data Processing, Journal of ACM, 1959.
• Generalization, aka abstraction/ data modeling
– File: A sequence of records.
– Operation: sort, select part of the file, …
• Makes data management and processing usable.
– People can learn and use the abstraction instead of
developing new data processing programs.
Abstraction is the key
• How to develop usable abstractions for our data?
– Data models, query languages,
– Relational data model, graph data model, …
• How to implement these abstractions efficiently?
– Database systems internal
– Storage management, indexing, ….
Topics
• How to develop usable abstractions for our data?
– relational data model
– graph data model
– database programming
• How to implement these abstractions efficiently?
– storage management and indexing
– query processing algorithms
– query optimization
– Transaction management
– parallel and distributed data processing
Our plan
• Learn the fundamental concepts and ideas
– Foundational models, algorithms, and systems.
– Textbooks, resources, and lectures.
• Apply them to new problems
– Apply the lessons learned to interesting database
problems.
– By doing assignments.
Learning the fundamentals: Lectures
• Review and discuss the material.
• Will be available on the course website after the
class.
• Provide the road map for studying
– The course material can seem overwhelming.
• Attendance is not required but encouraged.
• Read the course material before the class.
• Participate and ask questions!
Learning the fundamentals: Readings
• Textbooks:
– Database management systems, 3rd edition,
R. Ramakrishnan and J. Gehrke.
• Cow book
– Mining Massive data sets, Jure Leskovec, Anand
Rajaraman, Jeff Ullman.
• Free Online
– Papers for newer material: posted on the course website.
Learning the fundamentals: Readings
• Recommended
– Database systems: the complete book, 2nd edition, Hector
Garcia Molina, Jeffry Ullman, and Jennifer Widom.
• The complete book
– Foundations of databases, Serge Aitboul, Richard Hull,
Victor Vianu
• Alice book
Learning the fundamentals: Exam
• Midterm exam in class.
– Closed books and notes
– Tests your knowledge of the subjects discussed in
the class.
– 40% of the overall grade
– In class
• No final exam
Apply your understanding: assignments
• Seven assignments:
• Announced on Piazza and course website, posted on
the course website.
• Both written and programming.
• Submit using TEACH
• Write using word processors and submit in pdf.
• Start early!
• 60% of the overall grade
How to get the most out of the course?
• Communicate with the course staff
– TA: Vahid Ghadakchi, Parisa Ataie
– Piazza
• preferred method of communication
– Office hours
• Arash: Tuesday 4:30 – 5:30 pm
• Vahid: Monday/ Wednesday 4 – 5 pm
• Parisa: Monday 9 – 10 am
– Email the staff for other types of questions
• Use [cs440] tag in the subject line.
• Communicate with your peers on course materials and lectures.
• Check the Piazza and course website for announcements or
possible changes in the schedule.
What is next?
• A review of relational model, relational algebra,
and SQL.
• You refresh your memory by working on some
advanced problems on relational model and
database design.