Download Database Systems-1-intro

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database Systems
236363
Introduction
Bureaucratic Info (1/2)
• Lecturer: Prof. Roy Friedman
– Office: Taub 605
– E-mail: [email protected]
• TA in Charge: Omer Katz
– Office: Taub 603
– E-mail: [email protected]
• TA: Ella Bolshinski
– Office: Taub 315
– E-mail: [email protected]
Bureaucratic Info (2/2)
• Web site: http://webcourse.cs.technion.ac.il/236363/
• Home assignments: 3 dry, 1 wet
• Grade structure:
– 80% final exam, 20% home assignments (takef)
– 8% for the wet assignment, 4% each dry one
– Home assignments are mandatory – no final grade without
a pass grade in each home assignment
• Those who repeat the course must submit all home assignments
Topics
•
•
•
•
•
Introduction
Entity Relationship Diagrams (ERD)
Relational Algebra
Relational Calculus
Database Design Theory
– Functional Dependencies
– Schemas Decomposition and Normal Forms
• XML and the Query Language Xpath
• Advanced topics or datalog – time permitting
• In the recitation: The SQL Query Language
What’s New This Semester
• The lecture slides are in English
– Based on the slides of Eldar Fischer and Johann
Makowsky
• Some change in material
– We will not study Xquery
– We might not study Datalog
• Wet home assignment might be in Java
– To be determined next week
Let the Fun Begin…
• Database
– A (persistent) collection of data
– Often with some logical structure
– Examples include, e.g., bank accounts, students listed to
courses and their grades, geographical data used by a
map/navigation service, customers of an online web-site,
dentist’s patients
• A Query Language
– Enables querying and manipulating the database
– Examples include, e.g., Structured Query Language (SQL),
Datalog, Cassandra Query Language (CQL)
• Database Management System (DBMS)
– The system that manages the database and supports the
execution of queries on the database
Why Do We Need a DBMS?
• After all, we have operating systems and file systems…
• DBMS provides a data oriented abstraction for
manipulating the data
• Enables direct manipulation of the data without
worrying about storage and execution issues
– Frees the programmer from worrying about many low
level details such as:
• Serializing and de-serializing the data to the storage
• Organizing the data in the storage and masking storage latencies
and inefficiencies
• How to enable concurrent access to the database?
• What to do when the database is larger than physical memory?
• Split/cached database operation in combined mobile/cloud
Data Model
• The data model defines the framework for
how data is represented
– For example, in the relational model, data is
represented as tables (relations)
• The query language enables extracting data
from the database according to the given data
model
– For example, SQL can express queries on tables
and the results are tables as well
DBMS Functionality
• Storage management
• Query processor
– Optimizes query processing for efficient
information retrieval and query execution
• Concurrency control and recovery
• Data integrity
– E.g., that an ID number is a unique 9 digit number
• Security
– Access control, authentication and encryption
Data Representation Independence
• To obtain independence between the data
model and the physical storage, we separate
between three levels
– User view, logical level, and physical level
View
Data is organized here
according to the data
model (relational)
Independence of logical
layout beyond this level
Logical
Physical
Independence of physical
layout beyond this level
Upper Database Layer
• In the end-user layer, each user is provided
with a (potentially partial) view that may be
different from the actual data layout
Operations on a Database
• Database structure definition (following an analysis of
the application’s needs): Includes the logical structures
for representing the data and their relations
– Data Definition Language (DDL)
• Query execution to retrieve data from the database
• Data manipulation: adding, deleting, and updating
– Data Manipulation Language (DML)
• Administrative operations:
– Defining views, indexes, etc.
Database Administrator (DBA)
• Responsible for
– Planning the logical database layout and adapting
it to the physical layer
– Security and access control
– Recovery management (after failures)
– Performance fine tuning
Data Models (1/2)
• Relational data model
– Data is represented using tables
– Correspondence between tables is obtained by using
same values in columns with the same name
– The main focus of this course.
• Entity Relation
– A tool for analyzing the requirements of a database
and designing its schema
– This model is an abstract one and has no actual direct
implementation
Data Models (2/2)
• Object Oriented data model
– A model in which the data is represented as objects
similarly to what is done in OOP
• ERD can be mapped to OO
• Semi-structured data model
– Data is represented as a graph (independent of the
physical layout)
• Extensible Markup Language (XML)
– A specific instance of the semi-structured model in
which the graph is a tree
The Relational Data Model
• We wish to represent a collection of objects of a given type, each
characterized by a fixed set of properties
– For example, student’s name, date of birth, department
• How would we do this in C or Java?
• In the relational data model, we maintain these objects in a table
– Each row is used to store one object
– Each column represents one of the properties
Student Name
Date of Birth
Department
John Doe
01/01/1990
Computer Science
Jane Roe
07/07/1992
Electrical Engineering
…
…
…
• The table is a logical structure
– Might be physically stored in a completely different manner
– Each row must be different than each other row in at least one
attribute
• The table represents a set rather than a multi-set
Relational Model: Terminology
Schema (title of table)
An attribute (column name)
Student Name
Date of Birth
Department
John Doe
01/01/1990
Computer Science
Jane Roe
07/07/1992
Electrical Engineering
…
…
…
A relation (the entire table)
A record (an entire row)
A Formal Definition
• For a given set of attributes A1,…,An and a set
of corresponding domains D1,…,Dn (each
domain is a set of values)
• Denote R(A1,…,An) the relational schema that
contains the attributes A1,…,An
• A relation r over R is a subset of the domain
product r ⊆ D1xD2x…xDn
The Formal Definition - Visualized
• Instead of viewing each record as a line in a
table, it is viewed as a point in the space of all
possible value assignments. The relation is a
finite subset of “all possible records”.
Name
John Doe, 01/01/1990
Jane Roe, 07/07/1992
Birth Date
Keys
• A superkey of a relation r is a subset of attributes of r’s
schema such that specific values of these attributes identify
a single record in r. In other words, there are no two
records in r whose values in all attributes of the superkey
are the same.
• A relation may have several superkeys. A superkey is called
minimal if none of its subsets is a superkey. Such a key is
also called a candidate key.
• One of the superkeys can be selected as the primary key.
The primary key is used to identify a row in the
implementation of the database.
Yet, in PostgreSQL, when no key is defined, a table can include multiple
records with the same values in all attributes
Simple Databases
• In the simplest case, all objects of interest are of the
same “kind”, meaning that they all have the same
attributes list – they are only distinguished by their
specific attributes values
• For example, the list of songs on my computer – each
such object is characterized by the name of the song,
the format (mp3/wma/…), playing time, and size
• In these cases, all objects are organized in a single table
More Involved Databases
• Suppose we wish to design a database for the faculty’s
administrative assistants. Here, we can identify at least two
types of entities:
– Students: student name, id, address
– Courses: course name, catalogue number, lecturer
• If a student is registered to a given course, we should be
able to know about it and be able to retrieve the student’s
final grade
– Hence, each student’s participation in a given course should be
recorded somewhere
• The question is how to organize the database for this?
Possible Organization
• A simple option: Have a single large table – for each
student’s registration, we will hold the student’s name, id,
address, course name, catalogue number, lecturer, and final
grade
• Drawbacks:
– Redundancy: Why should the student’s address be stored in
each course she takes?
– Inadequacy: How can we maintain the details of a student that
does not take any course?
– Difficult to update: If a student changes his address, we will
need to update all records of all courses he is registered to. This
is both expensive and a source of inconsistency
Another Option
• We can maintain one table for students, one for
courses, and one for registration
• The registration table schema can include a
primary key for the students (e.g., student id) and
a primary key for the courses (e.g., catalogue
number) as well as the final grade
• Now each student’s data is independent of each
course’s data and vice versa
Another Option
Student Name
Student ID
Address
Jane Roe
12345678
Technion City
Course Name
Catalogue Number
Lecturer
Database Systems
236363
Rony Superstar
Catalogue Number
Student ID
Final Grade
236363
12345678
95
Students
…
Courses
…
…
Registrations
Life is Full of Difficult Choices
• What if we wish to retrieve the names of all lecturers who
taught a given student?
• If in the registrations table we only maintain the course’
catalogue number, this query will require a long time to
compute
• Further, if the lecturers change over time, the reply will not be
accurate
– How do we fix this?
• Should we add a lecturer attribute to the registrations table?
• Should we define a new “lecturers” table and corresponding relations?
• The database organizational design choices are not trivial –
this is the subject of much of this course