Download Itec 3220

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
ITEC 3220A
Using and Designing Database Systems
Instructor: Prof. Z. Yang
Course Website:
http://people.math.yorku.ca/~zyang/itec
3220a.htm
Office: TEL 3049
Course Objective
• Examine databases, trends in database
management systems and their application in a
wide range of organizational areas
• Provide an overview of database processing, both
historical and discussion of recent trends in
database management
• Provide the student with exposure to a range of
tools, including a relational DBMS as well as an
object-oriented DBMS
2
Textbook
• Database Systems: Design,
Implementation, and Management, 12th
Edition- Carlos Coronel & Steven Morris
3
Marking Scheme
• Final exam (closed book) - 50%
Midterm (closed book) - 35%
Assignments (2 assignments) - 15%
• Lecture notes will be made available at:
http://people.yorku.ca/~zyang/itec3220a.htm
4
Schedule
• Week 1 Database concepts and the
relational database model
• Week 2 Entity relationship model
• Week 3 Normalization
• Week 4 SQL
• Week 5 SQL + lab
• Week 6 Advanced SQL + lab
5
Schedule (Cont’d)
• Week 7 Midterm
• Week 8 Database design & case study
• Week 9 Transaction management and
concurrent control
• Week 10 Transaction management and
concurrent control (Cont’d) ; Data
warehousing
• Week 11 Objected-Oriented database
• Week 12 Review
6
Introduction
Database Systems and
Data Models
Basic Definition
• Data: raw facts
– Constitute building blocks of information
• Information: is produced by processing data and
reveals meaning of data
– Requires context
– Should be accurate, relevant, and timely to enable
good decision making
• Database: shared, integrated computer structure
housing:
– End-user data - Raw facts of interest to end user
– Metadata: Data about data, which the end-user data
are integrated and managed
8
An Example
• Converting data to information
9
An Example (Cont’d)
• Metadata
10
What is a Database Management System
(DBMS)
• A collection of programs that manages the
database structure and controls access to the
data stored in the database
• Role of the DBMS
– Intermediary between the user and the database
– Enables data to be shared
– Presents the end user with an integrated view of
the data
– Receives and translates application requests into
operations required to fulfill the requests
– Hides database’s internal complexity from the
application programs and users
11
DBMS Manages Interaction
12
Types of Databases
• Centralized database, distributed
database and cloud database
• Single-user database, multiuser
database
• Analytical database, business
intelligence
– Data warehouse
– Online analytical processing
13
File and File System
• Terminology
– Data
• Raw Facts
– Field
• Group of characters with specific meaning
– Record
• Logically connected fields that describe a person,
place, or thing
– File
• Collection of related records
14
Example
15
Problems with File System Data
Processing
• Lengthy development times
• Difficulty of getting quick answers
• Complex system administration
• Lack of security and limited data sharing
• Extensive programming
16
Structural and Data Dependence
•Structural dependence: Access to a file
is dependent on its own structure
– All file system programs are modified to
conform to a new file structure
•Structural independence: File structure
is changed without affecting the
application’s ability to access the data
17
Structural and Data Dependence
• Data dependence
– Data access changes when data storage
characteristics change
• Data independence
– Data storage characteristics is changed without
affecting the program’s ability to access the
data
• Practical significance of data dependence is
difference between logical and physical format
18
Data Redundancy
•Unnecessarily storing same data at different
places
•Islands of information: Scattered data
locations
– Increases the probability of having different versions
of the same data
•Data anomalies
– Modification
– Insertion
– Deletion
19
Example
20
Database Systems
•Logically related data stored in a single logical
data repository
– Physically distributed among multiple storage
facilities
– DBMS eliminates most of file system’s
problems
• Current generation DBMS software:
• Stores data structures, relationships between
structures, and access paths
• Defines, stores, and manages all access paths
and components
21
Database vs. File Systems
22
Database Models
• Collection of logical constructs used to
represent data structure and
relationships within the database
– Conceptual models: logical nature of data
representation
– Implementation models: emphasis on how
the data are represented in the database
23
Database Models: Historic
Overview
24
Hierarchical and Network Models
Hierarchical Models
Network Models
• Developed to manage large
amounts of data for complex
manufacturing projects
• Represented by an upsidedown tree which contains
segments (equivalent of a
• Created to represent complex
data relationships effectively
• Improved database
performance and imposed a
database standard
• Allows a record to have more
than one parent
• Depicts both one-to-many
(1:M) and many-to-many (M:N)
relationships
file system’s record type)
• Depicts a set of one-to-many
(1:M) relationships
25
The Relational Model
• Produced an automatic transmission
database that replaced standard
transmission databases
• Based on a relation
– Relation or table: Matrix composed of
intersecting tuple and attribute
•Tuple: Rows
•Attribute: Columns
• Describes a precise set of data
manipulation constructs
26
Relational Database Management
System (RDBMS)
• Performs basic functions provided by the
hierarchical and network DBMS systems
• Makes the relational data model easier to
understand and implement
• Hides the complexities of the relational
model from the user
27
Relational Database Model
(Cont’d)
28
Relational Database Model
(Cont’d)
• Schema for the table
– Graphical representation
AGENT
AGENT_C
ODE
AGENT_LN AGENT_FN
AME
AME
AGENT_INI AGENT_AREA AGENT_PH
TIAL
CODE
ONE
– Text description
AGENT(AGENT_CODE, AGENT_LNAME,
AGENT_FNAME, AGENT_INITIAL,
AGENT_AREACODE, AGETN_PHONE)
29
The Object-Oriented Data Model
(OODM) or Semantic Data Model
•Object-oriented database management
system(OODBMS)
– Based on OODM
•Object: Contains data and their relationships
with operations that are performed on it
– Basic building block for autonomous structures
– Abstraction of real-world entity
•Attributes - Describe the properties of an
object
30
NoSQL Databases
•Not based on the relational model
•Support distributed database
architectures
•Provide high scalability, high availability,
and fault tolerance
•Support large amounts of sparse data
•Geared toward performance rather than
transaction consistency
•Store data in key-value stores
31
Hierarchical Model
Advantages
Disadvantages
• Promotes data sharing
• Parent/child
relationship promotes
conceptual simplicity
and data integrity
• Database security is
provided and enforced
by DBMS
• Efficient with 1:M
relationships
• Requires knowledge of physical
data storage characteristics
• Navigational system requires
knowledge of hierarchical path
• Changes in structure require
changes in all application
programs
• Implementation limitations
• No data definition
• Lack of standards
32
Network Model
Advantages
Disadvantages
• Conceptual simplicity
• System complexity
• Handles more relationship
limits efficiency
types
• Navigational system
• Data access is flexible
yields complex
• Data owner/member
implementation,
relationship promotes data
application
integrity
development, and
management
• Conformance to standards
• Structural changes
• Includes data definition
require changes in all
language (DDL) and data
application programs 33
manipulation language (DML)
Relational Model
Advantages
• Structural independence
is promoted using
independent tables
• Tabular view improves
conceptual simplicity
• Ad hoc query capability is
based on SQL
• Isolates the end user from
physical-level details
• Improves implementation
and management
simplicity
Disadvantages
• Requires substantial
hardware and system
software overhead
• Conceptual simplicity gives
untrained people the tools
to use a good system poorly
• May promote information
problems
34
Object-Oriented Model
Advantages
• Semantic content is
added
• Visual
representation
includes semantic
content
• Inheritance
promotes data
integrity
Disadvantages
• Slow development of standards
caused vendors to supply their
own enhancements
– Compromised widely accepted
standard
• Complex navigational system
• Learning curve is steep
• High system overhead slows
transactions
35
NoSQL
Advantages
• High scalability,
availability, and fault
tolerance are provided
• Uses low-cost
commodity hardware
• Supports Big Data
• Key-value model
improves storage
efficiency
Disadvantages
• Complex programming is
required
• There is no relationship
support
• There is no transaction
integrity support
• In terms of data consistency,
it provides an eventually
consistent model
36
Chapter 3
The Relational Database Model
Basic Definition
• Entities and Attributes
– Entity is a person, place, event, or thing
about which data is collected
– Attributes are characteristics of the entity
• Tables
– Holds related entities or entity set
– Also called relations
– Comprised of rows and columns
38
Table Characteristics
•
•
•
•
•
•
•
•
Two-dimensional structure with rows and columns
Rows (tuples) represent single entity
Columns represent attributes
Row/column intersection represents single value
Tables must have an attribute to uniquely identify each row
Column values all have same data format
Each column has range of values called attribute domain
Order of the rows and columns is immaterial to the DBMS
39
Example Tables
40
Terminology for Relational
Database
Table-Oriented
Set-oriented
Record-Oriented
Table
Relation
Record type
Row
Tuple
Record
Column
Attribute
Field
41
Key
• Consists of one or more attributes that
determine other attributes
• Primary key (PK) is an attribute (or a
combination of attributes) that uniquely
identifies any given entity (row).
• Key’s role is based on determination
– If you know the value of attribute A, you can look up
(determine) the value of attribute B
42
Keys (Cont’d)
• Composite key
– Composed of more than one attribute
• Key attribute
– Any attribute that is part of a key
• Superkey
– Any key that uniquely identifies each entity
• Candidate key
– A superkey without redundancies
43
Keys (Cont’d)
• Foreign key (FK)
– An attribute whose values match primary key
values in the related table
• Referential integrity
– FK contains a value that refers to an existing valid
tuple (row) in another relation
• Secondary key
– Key used strictly for data retrieval purposes
44
Simple Relational Database
45
Controlled Redundancy
• Makes the relational database work
• Tables within the database share common
attributes that enable us to link tables
together.
• Multiple occurrences of values in a table are
not redundant when they are required to
make the relationship work.
• Redundancy is unnecessary duplication of
data
46
Integrity Rules
47
Integrity Rules (cont’d)
48
Exercises
Table name: TRUCK
Table name: BASE
Table name:
TYPE
49
Exercises (Cont’d)
• For each table, identify the primary key and
the foreign keys.
• Do the tables exhibit entity integrity? Explain
• So the tables exhibit referential integrity?
Explain
• Identify the TRUCK table’s candidate key (s).
• For each table, identify a super key and a
secondary key
50