Download Document

Document related concepts

Big data wikipedia , lookup

Versant Object Database wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data center wikipedia , lookup

Data model wikipedia , lookup

Data analysis wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

3D optical data storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Clusterpoint wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
Databases
Chapter 5
Objectives
Understanding the strategic advantage of using
Databases?
How does it support operations?
How does it support management?
What are the “best practices” today?
Concepts
Major types of databases.
Data warehouses and data mining.
Logical data elements.
Fundamental database structures.
Database development.
Databases are Important
and Exciting
Q: Why are databases so important?
A: Today’s business enterprises cannot survive
or succeed without quality data about their
internal operations and external environment.
Q: Why are database so exciting?
A: This question makes no sense; its like asking
why are Brad Pitt and Angelina Jolie so hot?
Data Resource Management
Definition:
A managerial activity that applies information
systems technologies to the task of managing an
organization’s data resources to meet the
information needs of their business
stakeholders.
Deep down inside, you all want to be data
resource managers; you really do; trust me!
Foundation Data Concepts
Character – single alphabetic, numeric or other symbol
Field – group of related characters
Entity – person, place, object or event
Attribute – characteristic of an entity
Foundation Data Concepts
Record – collection of attributes that describe an
entity
File – group of related records
Database – integrated collection of logically
related data elements
Logical Data Elements
Entities and Relationships
Entities and Relationships
Entities
Real world things with attributes
Abstract things that can be recorded
A Bill, A Person, A Building, A Product, A
Fishmonger, A Recipe
Relationships
How are the entities related
How are entities connected logically
Relationships
Parent has Child
Customer makes An Order
Student takes A Class
Fishmonger works at A Dock
Recipe includes An Ingredient
Brad Pitt dates Angelina Jolie
Types of Databases
Types of Databases
Operational – store detailed data needed to support
the business processes and operations of a company
Distributed – databases that are replicated and
distributed in whole or in part to network servers at a
variety of sites
External – contain a wealth of information available
from commercial online services and from many
sources on the World Wide Web
Hypermedia – consist of hyperlinked pages of
multimedia
Data Warehouse
Definition:
Large database that stores data that have been
extracted from the various operational, external,
and other databases of an organization
Data Warehouse System
Data Mart
Definition:
Databases that hold subsets of data from a data
warehouse that focus on specific aspects of a
company, such as a department or a business
process
Data Warehouse & Data Marts
Data Mining
Definition:
Analyzing the data in a data warehouse to reveal
hidden patterns and trends in historical business
activity
Data Mining
Data Mining Uses
Perform “market-basket analysis” to identify new
product bundles.
Find root causes to quality or manufacturing
problems.
Prevent customer attrition and acquire new
customers.
Cross-sell to existing customers.
Profile customers with more accuracy.
Traditional File Processing
Definition:
Data are organized, stored, and processed in
independent files of data records
Flat file
BUS100,ACCT 1,SMITH J.,M 10-11
BUS100,ACCT 1,JONES A., T 11-12
BUS200,ACCT 2,SMITH J., T 1-2
BUS200,ACCT 2,ANDERSON B.,T 3-4
CSI120,PROG 1,WILLIAMS C.,W 4-6
Problems with “flat files”
Concurrent access update problems
Serialization
Searching for a record – sequential
Inserting a record
Update/change size of record
Deleting a record
Problems of File Processing
Data Redundancy – duplicate data requires an update
to be made to all files storing that data
Lack of Data Integration – data stored in separate files
require special programs for output making ad hoc
reporting difficult
Data Dependence – programs must include information
about how the data is stored so a change in storage
format requires a change in programs
Database Management Approach
Definition:
Consolidates data records into one database that can
be accessed by many different application programs.
Software interface between users and databases
Data definition is stored once, separately from
application programs
Database Management Approach
Database Management Software
(DBMS)
Definition:
Software that controls the creation,
maintenance, and use of databases
Database Interrogation
Definition:
Capability of a DBMS to report information from the
database in response to end users’ requests
Query Language – allows easy, immediate access to
ad hoc data requests
Report Generator - allows quick, easy specification of
a report format for information users have requested
Natural Language vs. SQL Queries
Database Structures
Hierarchical – relationships between records
form a hierarchy or treelike structure
Network – data can be accessed by one of
several paths because any data element or
record can be related to any number of other
data elements
Relational Database Structure
Definition:
All data elements within the database are viewed
as being stored in the form of simple tables
Relational Database
M : N relationship
N
M
EMPLOYEE
WORKS_
ON
PROJECT
Modeling Relationships
using Tables
1:1
1:Many (1:N)
Many : Many (N:M)
Instructors
InstID
LName
FName
100
Babych
Dave
200
Sundin
Susan
300
Griffey
Rose
400
Chan
Tom
Compensation
InstID
Salary
Benefits
100
60000
YES
One to One
200
45000
YES
PK
300
75000
NO
400
39000
YES
Instructors
InstID
100
LName
Babych
FName
Dave
Requires
FOREIGN
KEY
200
Sundin
Susan
300
Griffey
Rose
400
Chan
Tom
One to Many
Sections
SectID
InstID
Term
C195
100
S-05
C220
300
S-05
C295
300
S-05
F750
400
S-05
Many-to-many Instructor/Courses
No-No’s: PK violation
Instr
Name
Course
Name
Time
Smith
Smith
Jones
CSIS114
CSIS120
CSIS114
MW 1-2
TR 3-4
MW 9-10
Many-to-many Instructor/Courses
No-No’s: Multiple entry fields
Instr
Name
Course
Name
Time
Smith
Jones
CSIS114, CSIS120
CSIS114
MW 1-2, TR 3-4
MW 9-10
Instructors
InstID
LName
100
Babych
200
Sundin
300
Griffey
400
Chan
FName
Dave
Susan
Rose
Tom
Courses
CrsID
Title
CPSC310
CPSC440
ENGL150
MKTG350
MIS
Intro. Prog.
Short story
Marketing
Section – linking Table
SecID
InstID
CrsID
Term
C195
100
MKTG350
F-00
C220
300
CPSC310
F-00
C295
300
CPSC440
W-00
F750
400
CPSC440
S-00
FK
FK
Searching a file
UNSORTED file:
Sequential search
On average, must read N/2 records
Very inefficient
Search Keys
Primary keys – unique (SSN) or SID !
ORDERS the Table and is always indexed
Must be unique
Indexed allows fast search
Secondary keys – together make the record
unique (NAME, BIRTHDAY)
Fig 5.2
Search a sorted file
(using direct access)
Can use Binary Search for a record
Skip to “Middle” of file
Compare search key with record
If smaller – record in first half of file
If larger – record in second half of file
Continue until record found or no more records to
search
Efficient – about Log base 2(N) probes
Indexing
Adding additional navigation information
Help locate records quickly
Primary Index:
Contains KEY field and BLOCK number
Only KEY of first record in BLOCK
Do binary search on (smaller) Index file
Primary Key Index
Non-dense
Block anchor key, Pointer
Aaron, Ed
Adams, John
...
Aaron, Ed
ENTIRE RECORD
Other names...
ENTIRE RECORD
Acosta, Marc
ENTIRE RECORD
Blocks on Disk
Adams, John
ENTIRE RECORD
Other names...
ENTIRE RECORD
Akers, Jan
ENTIRE RECORD
Secondary index
Need way to also quickly find records by an
attribute that the file is not stored by. (nonprimary index)
EX: file stored by customer number, need to
reference by name as well.
ABLE
1, CHARLIE
BAKER
2, PAUL
CHARLIE
3, DAVID
DAVID
4, ABLE
MARY
5, BAKER
PAUL
6, MARY
Dense Secondary Index
Database approach:
advantages
Centralized repository
Program/db independence
Can form new data relationships
Reduce redundancy
Data integrity – transaction rollback
Data recovery – log files
More database advantages
Logical view of data – Schemas
Standardized access routines
Security
Concurrency – record locking
DBMS utilities – reorganize/backup etc.
Disadvantages???
$$$$
Additional staff
All eggs in one basket
Lock in to a specific vendor
Example
Design database tables that model an advisor at
Siena and student advisees.
Advisor Attributes:



Professor Employee ID (PK)
Name
Department
Student



Siena ID (PK)
Name, address, Major, e-mail address etc.
Advisor ID (FK)
Design tables to model
Actors in Movies
Movie table
Title or movie-id [PK]
Studio
Year filmed, etc.
Actor table
Actor ID [PK]
Actor name
Address
Age, etc
Acted-In Link table
Movie Title [FK]
Actor ID [FK]
Compensation
Role the actor played,
etc.
Database Management Systems (DBMSs)
re-cap
Provide a user view
Provide tools to create & modify the database
Store & retrieve data
Manipulate data
Produce reports
Data Design Issues
Content: What data should be collected?
Access: What data should be given to what
users?
Logical structure: How will the data be organized
to make sense to a particular user?
Physical organization: Where will the data
actually be located? What indexes to build?
Fig 5.11
Fig 5.17
Selecting a DBMS
Database size
Number of concurrent users
Performance
Integration
Selecting a DBMS
Features
Vendor
Cost