Download CENG 352 Database Management Systems - COW :: Ceng

Document related concepts

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
CENG 352
Database Management Systems
Nihan Kesim Çiçekli
email: [email protected]
URL: http://www.ceng.metu.edu.tr/~nihan
CENG 352
•
•
•
•
Instructor: Nihan Kesim Çiçekli
Office: A308
Email: [email protected]
Lecture Hours: Tue. 11:40 (BMB5);
Thu. 10:40-12:30 (BMB5)
• Course Web page: http://cow.ceng.metu.edu.tr
• Teaching Assistant:
Ömer Baykal, [email protected]
2
Required Background
• Material from CENG351 (EntityRelationship modeling, relational data
model, relational algebra, SQL, B+-trees
and hash-based indexing, algorithms for
sorting/merging data files)
• Intermediate programming skills
– C/C++/Java
3
Text Books and References
1. Raghu Ramakrishnan, Database Management Systems,
McGraw Hill, 3rd edition, 2003 (text book).
2. R. Elmasri, S.B. Navathe, Fundamentals of Database
Systems, 4th edition, Addison-Wesley, 2004.
3. A. Silberschatz, H.F. Korth, S. Sudarshan, Database
System Concepts, McGraw Hill, 4th edition, 2002.
4. H. Garcia-Molina, J. D. Ullman, J. Widom, Database
Systems The Complete Book, Prentice Hall, 2002.
4
Course Outline
•
•
•
•
•
•
•
•
Review of the Relational Data Model
Database Application Development
Relational Database Design and Tuning
Transaction Processing
Concurrency Control
Crash Recovery
Query Processing
Query Optimization
5
Grading
•
•
•
•
Midterm
Assignments and Quizzes
Project
Final Exam
25 %
25 %
20 %
30 %
6
Grading Policies
• Policy on missed midterm:
– no make-up exam; compensated at final (only for the
legal excuses and if informed beforehand)
• Policy on final exam:
– Get at least 30% points from at least one of the written
assignments and at least 30% points from the
programming assignment
– Attendance should be > 50%
• Lateness policy:
– No late submission for the written assignments.
– For the programming homework(s): can be late up to 7
days. Afterwards, will be penalized by 10%* day2
– The last assignment may not be turned in late
– All assignments are to be your own work.
7
Project
• Projects in groups.
– Determine your groups till the end of
add/drops.
• 3 stages:
– Proposal Report (E/R)
– Design Report
– Final Report + Demo
8
Introduction
9
File-Based Systems
• Collection of application programs that
perform services for the end users (e.g.
reports).
• Each program defines and manages its own
data.
10
File-Based Processing
Pearson Education © 2009
11
Limitations of File-Based
Approach
• Separation and isolation of data
– Each program maintains its own set of data.
– Users of one program may be unaware of
potentially useful data held by other programs.
• Duplication of data
– Same data is held by different programs.
– Wasted space and potentially different values
and/or different formats for the same item.
12
Limitations of File-Based
Approach
• Data dependence
– File structure is defined in the program code.
• Incompatible file formats
– Programs are written in different languages, and so
cannot easily access each other’s files.
• Fixed Queries/Proliferation of application
programs
– Programs are written to satisfy particular functions.
– Any new requirement needs a new program.
13
Database Approach
• Arose because:
– Definition of data was embedded in application
programs, rather than being stored separately and
independently.
– No control over access and manipulation of data
beyond that imposed by application programs.
• Result:
– the database and Database Management System
(DBMS).
14
Database
• Shared collection of logically related data (and a
description of this data), designed to meet the
information needs of an organization.
• System catalog (metadata) provides description of
data to enable program–data independence.
• Logically related data comprises entities, attributes,
and relationships of an organization’s information.
15
Database Management System
(DBMS)
• A software system that enables users to define,
create, maintain, and control access to the
database.
• (Database) application program: a computer
program that interacts with database by issuing an
appropriate request (SQL statement) to the
DBMS.
16
Typical DBMS Functionality
• Define a database : in terms of data types,
structures and constraints
• Construct or load the database on a secondary
storage medium
• Manipulating the database : querying, generating
reports, insertions, deletions and modifications to
its content
• Concurrent Processing and Sharing by a set of
users and programs – yet, keeping all data valid and
consistent
17
17
Database Management System
(DBMS)
Pearson Education © 2009
18
Database Approach
• Data definition language (DDL).
– Permits specification of data types, structures and any
data constraints.
– All specifications are stored in the database.
• Data manipulation language (DML).
– General enquiry facility (query language) of the data.
19
Database Approach
• Controlled access to database may include:
–
–
–
–
–
a security system
an integrity system
a concurrency control system
a recovery control system
a user-accessible catalog.
20
Views
• Allows each user to have his or her own
view of the database.
• A view is essentially some subset of the
database.
21
Views - Benefits
• Reduce complexity
• Provide a level of security
• Provide a mechanism to customize the
appearance of the database
• Present a consistent, unchanging picture of
the structure of the database, even if the
underlying database is changed
22
Disadvantages of DBMSs
•
•
•
•
•
•
Complexity
Size
Cost of DBMS
Additional hardware costs
Performance
Higher impact of a failure
23
Components of DBMS Environment
• Hardware
– Can range from a PC to a network of computers.
• Software
– DBMS, operating system, network software (if
necessary) and also the application programs.
• Data
– Used by the organization and a description of this data
called the schema.
• Procedures
– Instructions and rules that should be applied to the
design and use of the database and DBMS.
• People
24
Roles in the Database
Environment
•
•
•
•
•
Data Administrator (DA)
Database Administrator (DBA)
Database Designers (Logical and Physical)
Application Programmers
End Users (naive and sophisticated)
25
History of Database Systems
• First-generation
– Hierarchical and Network
• Second generation
– Relational
• Third generation
– Object-Relational
– Object-Oriented
26
Data Model
• Integrated collection of concepts for describing
data, relationships between data, and constraints
on the data in an organization.
– Purpose is to represent data in an understandable way.
• Data Model comprises:
– a structural part;
– a manipulative part;
– possibly a set of integrity rules.
27
Data Models
• Object-Based Data Models
–
–
–
–
Entity-Relationship
Semantic
Functional
Object-Oriented.
• Record-Based Data Models
– Relational Data Model
– Network Data Model
– Hierarchical Data Model.
• Physical Data Models
Pearson Education © 2009
28
Relational Data Model
Pearson Education © 2009
29
Network Data Model
Pearson Education © 2009
30
Hierarchical Data Model
Pearson Education © 2009
31
Conceptual Modeling
• Conceptual schema is the core of a system
supporting all user views.
• Should be complete and accurate
representation of an organization’s data
requirements.
• Conceptual modeling is the process of
developing a model of information use that
is independent of implementation details.
• The result is the conceptual schema.
32
Objectives of Three-Level
Architecture
• DBA should be able to change database storage
structures without affecting the users’ views.
• Internal structure of database should be unaffected
by changes to physical aspects of storage.
• DBA should be able to change conceptual structure
of database without affecting all users.
33
Three-Level Architecture
34
Three-Level Architecture
• External Level
– Users’ view of the database.
– Describes that part of database that is relevant to a
particular user.
• Conceptual Level
– Community view of the database.
– Describes what data is stored in database and
relationships among the data.
• Internal Level
– Physical representation of the database on the
computer.
– Describes how the data is stored in the database.
35
Differences between Three Levels of
the Architecture
Pearson Education © 2009
36
Data Independence
• Logical Data Independence
– Refers to immunity of external schemas to
changes in conceptual schema.
– Conceptual schema changes (e.g.
addition/removal of entities).
– Should not require changes to external schema
or rewrites of application programs.
37
Data Independence
• Physical Data Independence
– Refers to immunity of conceptual schema to
changes in the internal schema.
– Internal schema changes (e.g. using different file
organizations, storage structures/devices).
– Should not require change to conceptual or
external schemas.
38
Data Independence and the ThreeLevel Architecture
Pearson Education © 2009
39
Database Languages
• Data Definition Language (DDL)
– Allows the DBA or user to describe and name
entities, attributes, and relationships required
for the application
– plus any associated integrity and security
constraints.
• Data Manipulation Language (DML)
– Provides basic data manipulation operations on
data held in the database.
40
System Catalog
• Repository of information (metadata)
describing the data in the database.
• One of the fundamental components of
DBMS.
• Typically stores:
–
–
–
–
–
names, types, and sizes of data items;
constraints on the data;
names of authorized users;
data items accessible by a user and the type of access;
usage statistics.
41
The DBMS Marketplace
• Relational DBMS companies – Oracle, Sybase – are among the
largest software companies in the world.
• IBM offers its relational DB2 system.
• Microsoft offers SQL-Server, plus Microsoft Access for the cheap
DBMS on the desktop, answered by “lite” systems from other
competitors.
• Relational companies also challenged by “object-oriented DB”
companies.
• But countered with “object-relational” systems, which retain the
relational core while allowing type extension as in OO systems.
42
Three Aspects to Studying DBMS's
1. Modeling and design of databases.
– Allows exploration of issues before committing to an
implementation.
2. Programming: queries and DB operations like
update.
– SQL
3. DBMS implementation.
43
Components of a DBMS
• Major components of a DBMS:
–
–
–
–
–
–
Query processor
Database manager (DM)
File manager
DML preprocessor
DDL compiler
Catalog manager
44
Components of a DBMS
• Major software components for database
– Transaction manager
manager
–
–
–
–
Authorization control
Command processor
Integrity checker
Query optimizer
– Scheduler
– Recovery manager
– Buffer manager
45
Components of a DBMS
Pearson Education © 2009
46
Components of Database Manager
(DM)
47
Pearson Education © 2009
Multi-User DBMS Architectures
• Teleprocessing
• File-server
• Client-server
48
Teleprocessing
• Traditional architecture.
• Single mainframe with a number of terminals
attached.
• Trend is now towards downsizing.
49
Pearson Education © 2009
File-Server
• File-server is connected to several workstations
across a network.
• Database resides on file-server.
• DBMS and applications run on each workstation.
• Disadvantages include:
– Significant network traffic.
– Copy of DBMS on each workstation.
– Concurrency, recovery and integrity control more complex.
Pearson Education © 2009
50
File-Server Architecture
Pearson Education © 2009
51
Traditional Two-Tier Client-Server
• Client (tier 1) manages user interface and runs
applications.
• Server (tier 2) holds database and DBMS.
• Advantages include:
–
–
–
–
–
wider access to existing databases;
increased performance;
possible reduction in hardware costs;
reduction in communication costs;
increased consistency.
Pearson Education © 2009
52
Traditional Two-Tier Client-Server
Pearson Education © 2009
53
Traditional Two-Tier Client-Server
Pearson Education © 2009
54
Three-Tier Client-Server
• Client side presented two problems preventing
true scalability:
– ‘Fat’ client, requiring considerable resources on client’s computer
to run effectively.
– Significant client side administration overhead.
• By 1995, three layers proposed, each potentially
running on a different platform.
55
Three-Tier Client-Server
• Advantages:
– ‘Thin’ client, requiring less expensive hardware.
– Application maintenance centralized.
– Easier to modify or replace one tier without affecting
others.
– Separating business logic from database functions
makes it easier to implement load balancing.
– Maps quite naturally to Web environment.
56
Three-Tier Client-Server
Pearson Education © 2009
57