Download Database Management System

Document related concepts

Concurrency control wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
FOUNDATIONS OF BUSINESS INTELLIGENCE:
DATABASES AND
INFORMATION MANAGEMENT
Lecture 16
TIM 50 Autumn 2012
Tuesday November 20, 2012
Announcement
1. The grades for every assignment will be given in eCommons.
2. It's important to check webpage to get the latest information
and assignments changes.
3. No Office hours on Wednesday, Friday( 11/21, 11/23) No Class on Thanks Giving Day, 11/22 Thursday Final Exam 1st Choice December 7, Friday
2nd Choice December 10, Monday depending on Schedule Permission Format is same as Midterm
Covering Up to Midterm 30‐ %
After Midterm 70+ %
Topics of Business intelligences
The problems of managing data resources in a traditional file
environment Important database design principles
The database management system
The capabilities and value of a database management system
Tools and technologies for accessing information from databases
Business Intelligence, Data Mining
The role of information policy, data administration, and data quality assurance in the management of a firm’s data resources
Foundation of business Intelligence
Division Oriented
Paper File Systems
Manual Processing
Data redundancy: Data inconsistency: Program‐data dependence:
Lack of flexibility
Poor security
Lack of data sharing and availability
Data Base Systems
Relational DB
Object Oriented DB
DBMS
Data Base System
Information Management
File Management System
File Processing Procedure
System Inefficiencies
Longer Business Cycle
No Firm wise Information or Data Access
No Data Security
No Decision son Integrated Data and inform
High Business Process Expenditure
Data Base Systems
DBMS,SQL
Intelligence from Collection of Data
Information Management
Business Applications
Data Integrity Control
Business Data Maintenances
Less Redundancy
Data Integrity
Efficiency
Data Confidentiality
OrganizingDatainaTraditionalFile
Environment
File organization concepts
Database: Group of related files
File: Group of records of same type Record: Group of related fields
Field: Group of characters as word(s) or number
Describes an entity (person, place, thing on which we store information)
Attribute: Each characteristic, or quality, describing entity
E.g., Attributes Date or Grade belong to entity COURSE
THE DATA HIERARCHY
A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database.
Information as Processed Data
Problems with the traditional file environment
Old Business Process;
Files maintained separately by different departments
Data redundancy: Presence of duplicate data in multiple files
Data inconsistency: Same attribute has different values
Program‐data dependence:
When changes in program requires changes to data accessed by program
Lack of flexibility
Poor security
Lack of data sharing and availability
TRADITIONAL FILE PROCESSING
The use of a traditional approach to file processing encourages each functional area in a corporation to develop specialized applications. Each application requires a unique data file that is likely to be a subset of the master file. These subsets of the master file lead to data redundancy and inconsistency, processing inflexibility, and wasted storage resources.
Business Processes with Old Data Processing
System Inefficiencies
Longer Business Cycle
No Firm wise Information or Data Access
No Data Security
No Decision on Integrated Data and information
High Business Process Expenditure
Introduction of Data Processing System
Database – collection of persistent data from
business divisions
Database Management System (DBMS) –
software system that supports creation,
population, and querying of a database
TheDatabaseApproachtoDataManagement
Database
Serves many business applications by centralizing data and controlling redundant data across division boundaries
Database management system (DBMS)
Interfaces between applications and physical data files
Separates logical and physical views of data
Solves problems of traditional file environment
Controls redundancy
Eliminates inconsistency
Uncouples programs and data
Enables organization to centrally manage data and data security
Definition
Although it is difficult to give a universally agreed definition
of a database, we use the following common definition:
Definition:
A database is a collection of related, logically coherent
data used by the application
programs in an organization.
14.13
DATABASE ARCHITECTURE
The
American
National
Standards Institute/Standards
Planning and Requirements
Committee
(ANSI/SPARC)
has established a three-level
architecture for a DBMS:
internal,
conceptual
and
external .
14.14
Database architecture
Hardware
The hardware is the physical computer system that allows
access to data.
Software
The software is the actual program that allows users to
access, maintain and update data. In addition, the software
controls which user can access which parts of the data in the
database.
Confidentiality
The data in a database is stored physically on the storage
devices. In a database, data is a separate entity from the
software that accesses it.
14.15
Users
In a DBMS, the term users has a broad meaning. We can
divide users into two categories: end users and application
programs.
Procedures
The last component of a DBMS is a set of procedures or
rules that should be clearly defined and followed by the users
of the database.
14.16
Advantages of databases
Comparing the flat-file system, we can mention several
advantages for a database system.
Less redundancy
In a flat-file system there is a lot of redundancy. For
example, in the flat file system for a university, the names of
professors and students are stored in more than one file.
Avoidance of Inconsistency
Inconsistency
If the same piece of information is stored in more than one
place, then any changes in the data need to occur in all places
that data is stored.
14.17
Efficiency
A database is usually more efficient that a flat file system,
because a piece of information is stored in fewer locations.
Data integrity
In a database system it is easier to maintain data integrity ,
because a piece of data is stored in fewer locations.
Data integrity contains guidelines for, data retention, specifying or guaranteeing the length of time of data can be retained
Confidentiality
It is easier to maintain the confidentiality of the information
if the storage of data is centralized in one location.
14.18
Evolution of Database Technologies
Evolution of database systems
• 2000 and beyond – multi –tier, client‐server,
• Distributed environments, • Web‐based, • Content‐addressable storage, data mining
DATA BASE MODEL OVERVIEW
•
•
•
•
•
ER‐Model
Hierarchical Model
Network Model
Relational Model
Object‐Oriented Model(s)
ER‐Model
• Data Structures
• Integrity Constraints
• Operations




The ER‐Model is extremely successful as a database design model
Translation algorithms to many data models
Commercial database design tools, e.g., ERwin No generally accepted query language
No database system is based on the model
ER: Entry Relation
ER‐Model ‐ Integrity Constraints
E1
1
R
n
E2
E
cardinality: 1:n for E1:E2 in R
E1
R
E1
E2
total participation of E2 in R
E1
R
weak entity type E2; identifying
relationship type R
E2
d
x
p
E3
disjoint
exclusion
partition
E2
A
key attribute
Hierarchical Database Model
In the hierarchical model, data is organized as an inverted
tree. Each entity has only one parent but can have several
children. At the top of the hierarchy, there is one entity,
which is called the root.
An example of the hierarchical model representing a university
14.24
Network Database Model
In the network model, the entities are organized in a graph,
in which some entities can be accessed through several paths
(Figure 14.4).
An example of the network model representing a university
14.25
Object-Oriented Databases(OODB)
An object-oriented database tries to keep the advantages of the
relational model and at the same time allows applications to access
structured data. In an object-oriented database, objects and their
relations are defined. In addition, each object can have attributes
that can be expressed as fields.
XML
The query language normally used for objected-oriented databases
is XML (Extensible Markup Language). As we discussed in
Chapter 6, XML was originally designed to add markup
information to text documents, but it has also found its application
as a query language in databases. XML can represent data with
nested structures.
14.26
Object‐Oriented Model

based on the object‐oriented paradigm,
e.g., Simula, Smalltalk, C++, Java

object‐oriented model has object‐oriented repository model; adds persistence and database capabilities; (see ODMG‐93, ODL, OQL)
object‐oriented commercial systems include GemStone, Ontos, Orion‐2, Statice, Versant, O2

Relational Database Model
In the relational model, data is organized in two-dimensional
tables called relations. The tables or relations are, however,
related to each other, as we will see shortly.
14.28
An example of the relational model representing a university
Relational DBMS;
Represent data as two‐dimensional tables called relations or files. In the relational database management system (RDBMS), the data is
represented as a set of relations.
Each table contains data on entity and attributes
Table: grid of columns and rows
Rows (tuples): Records for different entities
Fields (columns): Represents attribute for entity
Key field: Field used to uniquely identify each record
Primary key: Field in table used for key fields
Foreign key: Primary key used in second table as look‐up field to identify records from original table
Relations
A relation appears as a two-dimensional table. The RDBMS
organizes the data so that its external view is a set of
relations or tables. This does not mean that data is stored as
tables: the physical storage of the data is independent of the
way in which the data is logically organized.
14.30
An example of a relation
A relation in an RDBMS has the following features:
Name.
Each relation in a relational database should have
a name that is unique among other relations.
Attributes.
Each column in a relation is called an
attribute. The attributes are the column headings in the
table in Figure 14.6.
Tuples.
Each row in a relation is called a tuple. A tuple
defines a collection of attribute values. The total number
of rows in a relation is called the cardinality of the
relation. Note that the cardinality of a relation changes
when tuples are added or deleted. This makes the
database dynamic.
14.31
Schemas
• The name of a relation and the set of attributes for a relation is called a schema.
• We show the schema for the relation with the relation name followed by a parenthesized list of its attributes.
• Movies (title, year, length)
.
• Relational database schema = collection of relation schemas.
RELATIONAL DATABASE TABLES
A relational database organizes data in the form of two‐dimensional tables. Illustrated here are tables for the entities SUPPLIER and PART showing how they represent each entity and its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for the PART table.
RELATIONAL DATABASE TABLES A relational database organizes data in the form of two‐dimensional tables. Illustrated here are tables for the entities SUPPLIER and PART showing how they represent each entity and its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for the PART table.
Operations of a Relational DBMS
Three basic operations used to develop useful sets of data
SELECT: Creates subset of data of all records that meet stated criteria
JOIN: Combines relational tables to provide user with more information than available in individual tables
PROJECT: Creates subset of columns in table, creating tables with only the information specified
THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS
The select, join, and project operations enable data from two different tables to be combined and only selected attributes to be displayed.
Relational Database Example
• Relational Database Management System (RDBMS)
– Consists of a number of tables and single schema (definition of tables and attributes)
– Students (sid, name, login, age, gpa),Students identifies the table
sid, name, login, age, gpa identify attributes, sid is primary key
An Example Table
• Students (sid: string, name: string, login: string, age: integer, gpa: real) S1
sid
50000
53666
53688
53650
53831
53832
name
Dave
Jones
Smith
Smith
Madayan
Guldu
login
dave@cs
jones@cs
smith@ee
smith@math
madayan@music
guldu@music
age
19
18
18
19
11
12
gpa
3.3
3.4
3.2
3.8
1.8
2.0
Another table: Courses
• Courses (cid, instructor, quarter, dept) E
cid
instructor
quarter
dept
Carnatic101
Jane
Fall 06
Music
Reggae203
Bob
Summer 06 Music
Topology101 Mary
Spring 06
Math
History105
Fall 06
History
Alice
Keys
• Primary key – minimal subset of fields
that is unique identifier for a tuple
– sid is primary key for Students
– cid is primary key for Courses
• Foreign key –connections between tables
– Courses (cid, instructor, quarter, dept)
– Students (sid, name, login, age, gpa)
– How do we express which students take each
course?
Many to many relationships
• In general, need a new table
Enrolled(cid, grade, studid)
Studid is foreign key that references sid in Student table
Foreign
key
Enrolled
cid
grade studid
Carnatic101
C
53831
Reggae203
B
53832
Topology112
A
53650
History 105
B
53666
Student
sid
name
login
50000 Dave
dave@cs
53666 Jones
jones@cs
53688 Smith
smith@ee
53650 Smith
smith@math
53831 Madaya
n
madayan@musi
c
53832 Guldu
guldu@music
Relational Algebra
process for working
• Collection of operators for specifying queries
• Query describes step‐by‐step procedure for computing answer (i.e., operational)
• Each operator accepts one or two relations as input and returns a relation as output
• Relational algebra expression composed of multiple operators
Basic operators
• Selection – return rows that meet some condition
• Projection – return column values
• Union
• Cross product
• Difference
• Other operators can be defined in terms of basic operators
Simplified Schema Example • Courses (cid, instructor, quarter, dept)
• Students (sid, name, gpa)
• Enrolled (cid, grade, studid)
Set Operations
• Union (R U S)
– All tuples in R or S (or both)
– R and S must have same number of fields
– Corresponding fields must have same domains
• Intersection (R ∩ S)
– All tuples in both R and S
• Set difference (R – S)
– Tuples in R and not S
Set Operations (continued)
• Cross product or Cartesian product (R x S)
– All fields in R followed by all fields in S
– One tuple (r,s) for each pair of tuples r  R, s  S
Selection
Select students with gpa higher than 3.3 from S1:
σgpa>3.3(S1)
S1
sid
50000
53666
53688
53650
53831
53832
name
Dave
Jones
Smith
Smith
Madayan
Guldu
gpa
3.3
3.4
3.2
3.8
1.8
2.0
sid
name gpa
53666 Jones 3.4
53650 Smith 3.8
Projection
Project name and gpa of all students in S1:
name, gpa(S1)
S1
Sid
50000
53666
53688
53650
53831
53832
name
Dave
Jones
Smith
Smith
Madayan
Guldu
gpa
3.3
3.4
3.2
3.8
1.8
2.0
name
Dave
Jones
Smith
Smith
Madayan
Guldu
gpa
3.3
3.4
3.2
3.8
1.8
2.0
Combine Selection and Projection
• Project name and gpa of students in S1 with gpa higher than 3.3:
name,gpa(σgpa>3.3(S1))
Sid
50000
53666
53688
53650
53831
53832
name
Dave
Jones
Smith
Smith
Madayan
Guldu
gpa
3.3
3.4
3.2
3.8
1.8
2.0
name gpa
Jones 3.4
Smith 3.8
S1
sid
50000
53666
53688
53650
53831
53832
Example: Intersection
name
Dave
Jones
Smith
Smith
Madayan
Guldu
gpa
3.3
3.4
3.2
3.8
1.8
2.0
S1  S2 =
S2
sid
53666
53688
53700
53777
53832
sid
53666
53688
53832
name
Jones
Smith
Tom
Jerry
Guldu
name
Jones
Smith
Guldu
gpa
3.4
3.2
3.5
2.8
2.0
gpa
3.4
3.2
2.0
Joins
• Combine information from two or more tables
• Example: students enrolled in courses:
S1 S1.sid=E.studidE
S1
Sid
50000
53666
53688
53650
53831
53832
name
Dave
Jones
Smith
Smith
Madayan
Guldu
gpa
3.3
3.4
3.2
3.8
1.8
2.0
E
cid
grade studid
Carnatic101
C
53831
Reggae203
B
53832
Topology112
A
53650
History 105
B
53666
Joins
S1
Sid
50000
53666
53688
53650
53831
53832
name
Dave
Jones
Smith
Smith
Madayan
Guldu
gpa
3.3
3.4
3.2
3.8
1.8
2.0
E
sid
53666
53650
53831
53832
name
Jones
Smith
Madayan
Guldu
gpa
3.4
3.8
1.8
2.0
cid
History105
Topology112
Carnatic101
Reggae203
cid
grade studid
Carnatic101
C
53831
Reggae203
B
53832
Topology112
A
53650
History 105
B
53666
grade
B
A
C
B
studid
53666
53650
53831
53832
Relational Data Model:
summary
Relation as table
Rows = tuples
Columns = components
Names of columns = attributes
Relation name + set of attribute
names= schema
REL (A1,A2,...,An)
C
a
r
d
i
n
a
l
i
t
y
A1 A2 A3 ... An
a1 a2 a3
an
b1 b2 a3
cn
a1 c2 b3
.
.
.
bn
x1 v2 d3
wn
Arity
Attributes
Tuple
• Set theoretic
•
Domain — set of values
• like a data type
•
Cartesian product (or product)
• D1 D2 ... Dn
• n‐tuples (V1,V2,...,Vn)
• s.t., V1 D1, V2 D2,...,Vn Dn
–Relation=subset of cartesian product of one or more domains
• FINITE only; empty set allowed
Component
–Tuples = members of a relation inst.
–Arity = number of domains
–Components = values in a tuple
–Domains — corresp. with attributes
–Cardinality = number of tuples
What is Object Oriented Database? (OODB)
• A database system that incorporates all the important object‐oriented concepts
• Some additional features
– Unique Object identifiers
– Persistent object handling Object‐Oriented Concepts

Abstract Data Types 

Encapsulation


Implementation of operations and object structure hidden
Inheritance


Class definition, provides extension to complex attribute types
Sharing of data within hierarchy scope, supports code reusability
Polymorphism
•
Operator overloading
Object‐Oriented DBMS (OODBMS)
Stores data and procedures as objects
Objects can be graphics, multimedia, Java applets
Relatively slow compared with relational DBMS for processing large numbers of transactions
Hybrid object‐relational DBMS: Provide capabilities of both OODBMS and relational DBMS
Object Relationships
Object-Oriented Databases
•
Support data abstraction, encapsulation, and inheritance.
•
Allow object identification and communication.
•
Reuse and modify objects.
•
Deal with complex data types.
Object Relationships
Class representation
Object Inheritance
Employee
Name
Parents
Date of Birth
Sex
GetAge()
ComputeSalar
y()
Attributes
Methods
Nelson Caballero - 4/16/2001
Advantages of OODBS
• Designer can specify the structure of objects and their behavior (methods)
• Multimedia Contents
• Better interaction with object‐oriented languages such as Java and C++
• Definition of complex and user‐defined types
• Encapsulation of operations and user‐defined methods
Relational and Object-Oriented Databases
Database Management System
A software system that enables users to create and maintain the database.
Object Oriented
 Decision support applications.
 Engineering design applications.
 Ordinary business applications.
 Multimedia applications.
 Applications that integrate with
 Knowledge bases.
legacy systems.
 Conservative implementations.
 Applications with demanding distribution
and concurrency.
 Applications that require advanced
features.
 Electronic devices with embedded
software.
Source: Object oriented Modeling and design for database applications. Blaha, M. and Premerlani, W.
Nelson Caballero - 4/16/2001
Database management system (DBMS)
• A specific type of software for creating, storing,
organizing, and accessing data from a database
• Separates the logical and physical views of the data
• Logical view: how end users view data
• Physical view: how data are actually structured and
organized
• Examples of DBMS: Microsoft Access, DB2, Oracle
Database, Microsoft SQL Server, MYSQL
HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
A single human resources database provides many different views of data, depending on the information requirements of the user. Illustrated here are two possible views, one of interest to a benefits specialist and one of interest to a member of the company’s payroll department.
Capabilities of Database Management Systems
Data definition capability: Specifies structure of database content, used to create tables and define characteristics of fields
Data dictionary: Automated or manual file storing definitions of data elements and their characteristics
Data manipulation language(DML): Used to add, change, delete, retrieve data from database Meta data
Data that describes the properties or characteristics of other data
Does not include sample data
Allows database designers and users to understand the meaning of the data
Structured Query Language (SQL)
Microsoft Access user tools for generation SQL
Many DBMS have report generation capabilities for creating polished reports (Crystal Reports)
Each database will have a set of schemas associated with a catalog.
Schema = the structure that contains descriptions of objects created by a user (base tables, views, constraints)
Structured Query Language
Structured Query Language (SQL) is the language
standardized by the American National Standards Institute
(ANSI) and the International Organization for
Standardization (ISO) for use on relational databases.
It is a declarative rather than procedural language, which
means that users declare what they want without having to
write a step-by-step procedure. The SQL language was first
implemented by the Oracle Corporation in 1979, with
various versions of SQL being released since then.
14.67
SQL Is:
• The standard and most common language for relational database management systems • An SQL‐based relational database application involves a user interface, a set of tables in the database, and a RDBMS with an SQL capability
• Within the RDBMS SQL will be used to create the tables, translate user requests, maintain the data dictionary and system catalog, update an maintain the tables, establish security, and carry out backup and recovery procedures
A simplified schematic of a typical SQL environment
3 types of SQL commands
Data Definition Language (DDL) commands ‐ that define a database, including creating, altering, and dropping tables and establishing constraints
• Data Manipulation Language (DML) commands ‐ that maintain and query a database
• Data Control Language (DCL) commands ‐ that control a database, including administering privileges and committing data
•
Insert
The insert operation is a unary operation—that is, it is
applied to a single relation. The operation inserts a new tuple
into the relation. The insert operation uses the following
format:
14.71
Figure 14.7 An example of an insert operation
Delete
The delete operation is also a unary operation. The operation
deletes a tuple defined by a criterion from the relation. The
delete operation uses the following format:
14.72
An example of a delete operation
Update
The update operation is also a unary operation that is applied
to a single relation. The operation changes the value of some
attributes of a tuple. The update operation uses the following
format:
14.73
An example of an update operation
Select
The select operation is a unary operation. The tuples (rows)
in the resulting relation are a subset of the tuples in the
original relation.
14.74
An example of an select operation
Project
The project operation is also a unary operation and creates
another relation. The attributes (columns) in the resulting
relation are a subset of the attributes in the original relation.
14.75
Figure 14.11 An example of a project operation
Join
The join operation is a binary operation that combines two
relations on common attributes.
14.76
An example of a join operation
Union
The union operation takes two relations with the same set of
attributes.
14.77
An example of a union operation
Intersection
The intersection operation takes two relations and creates a
new relation, which is the intersection of the two.
14.78
An example of an intersection operation
Difference
The difference operation is applied to two relations with the
same attributes. The tuples in the resulting relation are those
that are in the first relation but not the second.
14.79
Figure 14.15 An example of a difference operation
DATABASE DESIGN
The design of any database is a lengthy and involved
task that can only be done through a step-by-step
process.
The first step normally involves interviewing potential
users of the database.
The second step is to build an entity-relationship
model (ERM) that defines the entities, the attributes of
those entities and the relationship between those
entities.
Designing Databases
Conceptual (logical) design: Abstract model from business
perspective
Physical design: How database is arranged on direct‐access storage devices
Design process identifies
Relationships among data elements, redundant database elements
Most efficient way to group data elements to meet business requirements, needs of application programs
Normalization
Streamlining complex groupings of data to minimize redundant data elements and awkward many‐to‐many relationships
Entity-relationship models (ERM)
Database Design
In this step, the database designer creates an entityrelationship (E-R) diagram to show the entities for which
information needs to be stored and the relationship between
those entities. E-R diagrams uses several geometric shapes,
but we use only a few of them here:
Rectangles represent entity sets
Ellipses represent attributes
Diamonds represent relationship sets
Lines link attributes to entity sets and link entity sets
to relationships sets
14.82
A very simple E-R diagram with three entity sets, their attributes
and the relationship between the entity sets.
14.83
From E-R diagrams to relations
After the E-R diagram has been finalized, relations (tables) in the
relational database can be created.
Relations for entity sets
For each entity set in the E-R diagram, we create a relation (table) in
which there are n columns related to the n attributes defined for that
set.
Entities, attributes and relationships in an E-R
diagram
14.84
We can have three relations (tables), one for each entity set
defined in Figure .
Relations for entity set
14.85
Relations for relationship sets
For each relationship set in the E-R diagram, we create a
relation (table). This relation has one column for the key of
each entity set involved in this relationship and also one
column for each attribute of the relationship itself if the
relationship has attributes (not in our case).
14.86
The relations for these relationship sets are added to the previous
relations for the entity set and shown
Relations for E-R diagram
14.87
Normalization
Normalization is the process by which a given set of
relations are transformed to a new set of relations with a
more solid structure.
Normalization is needed to allow any relation in the database
to be represented, to allow a language like SQL to use
powerful retrieval operations composed of atomic
operations, to remove anomalies in insertion, deletion, and
updating, and reduce the need for restructuring the database
as new data types are added.
14.88
First normal form (1NF)
When we transform entities or relationships into tabular
relations, there may be some relations in which there are
more values in the intersection of a row or column.
14.89
Figure 14.19 An example of 1NF
Second normal form (2NF)
In each relation we need to have a key (called a primary key)
on which all other attributes (column values) need to depend.
For example, if the ID of a student is given, it should be
possible to find the student’s name.
14.90
An example of 2NF
Other normal forms
Other normal forms use more complicated dependencies
among attributes. We leave these dependencies to books
dedicated to the discussion of database topics.
14.91
AN UNNORMALIZED RELATION FOR ORDER
Example
An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date.
NORMALIZED TABLES CREATED FROM ORDER
An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date.
Entity‐relationship diagram
Used by database designers to document the data model
Illustrates relationships between entities
Map binary relationships
•
The procedure for representing relationships depends on both the degree of the relationships (unary, binary, ternary) and the cardinalities of the relationships
Map binary one‐to‐one relationships
(1:1)
In a 1:1 relationship, the association in one direction is nearly always optional one, whilst the association in the other direction is mandatory one
You should include in the relation on the optional side of the relationship the foreign key of the entity type that has the mandatory participation in the 1:1 relationship
Map binary one‐to‐one relationships
• Any attributes associated wit the relationship itself are also included in the same relation as the foreign key
• The following Fig. Shows a binary 1:1 relationship between NURSE and CARE_CENTER, where each care centre must have a nurse who is in charge of that centre – so the association from care centre to nurse is a mandatory one, while the association from nurse to care centre is an optional one (since any nurse may or may not be in charge of a care centre)
Mapping a binary 1:1 relationship
Binary 1:1 relationship
Map binary one‐to‐many (1:M) relationships
• First create a relation for each of the two entity types participating in the relationship
• Next include the primary key attribute(s) of the entity on the one‐side as a foreign key in the relation that is on the many‐
side
• ‘Submits’ relationship in the following Fig. shows the primary key Customer_ID of CUSTOMER (the one‐side) included as a foreign key in ORDER (the many‐side) (signified by the arrow)
Example of mapping a 1:M relationship
Relationship between customers and orders
Note the mandatory one
Map binary many‐to‐many (M:N) relationships
• If such a relationship exists between entity types A and B, we create a new relation C, then include as foreign keys in C the primary keys for A and B, then these attributes become the primary key of C
• In the following Fig., first a relation is created for VENDOR and RAW_MATERIALS, then a relation QUOTE is created for the ‘Supplies’ relationship – with primary key formed from a combination of Vendor_ID and Material_ID (primary keys of VENDOR and RAW_MATERIALS). These are foreign keys that point to the respective primary keys Example of mapping an M:N relationship
ER diagram (M:N)
The Supplies relationship will need to become a separate relation
AN ENTITY‐RELATIONSHIP DIAGRAM
This graphic shows an example of an entity relationship diagram.
It shows that one ORDER can contain many LINE_ITEMs. (A PART can be
ordered many times and appear many times as a line item in a single order.)
Each LINE ITEM can contain only one PART.
Each PART can have only one SUPPLIER, but many PARTs can be provided
by the same SUPPLIER.
This diagram shows the relationships between the entities SUPPLIER, PART, LINE_ITEM, and ORDER that might be used to model the database Distributing databases:
Operations
Storing database in more than one place
Partitioned: Separate locations store different parts of database
Replicated: Central database duplicated in entirety
at different locations Distributed Databases
There are alternative ways of distributing a database. The central database can be partitioned (a) so that each remote processor has the necessary data to serve its own local needs. The central database also can be replicated (b) at all remote locations.
UsingDatabasestoImproveBusiness
PerformanceandDecisionMaking
Very large databases and systems require special capabilities, tools To analyze large quantities of data To access data from multiple systems
Three key techniques
1.Data warehousing 2.Data mining
3.Tools for accessing internal databases through the Web
DATABASE MANAGEMENT SYSTEM TOOLS
Five software components:
1. DBMS engine
2. Data definition subsystem
3. Data manipulation subsystem
4. Application generation subsystem
5. Data administration subsystem
3‐106
DATABASE MANAGEMENT SYSTEM TOOLS
3‐107
DBMS Engine
• DBMS engine – accepts logical requests from the various other DBMS subsystems, converts them into their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device
• DBMS engine separates the logical from the physical
3‐108
DBMS Engine
• Physical view – how information is physically arranged, stored, and accessed on some type of storage device
• Logical view – how you as a knowledge worker need to arrange and access information
• With a database, you only concern yourself with your logical view
3‐109
Data Definition Subsystem
• Data definition subsystem – helps you create and maintain the data dictionary and define the structure of the files in a database
• You must create a data dictionary before entering information into a database
• Module J covers this for Microsoft Access
3‐110
Data Manipulation Subsystem
• Data manipulation subsystem – helps you add, change, and delete information
• This is your primary DBMS interface as you work with a database
3‐111
– Views
– Report generators
– QBE tools
– SQL
Views
• View – allows you to see the contents of a database file
– Make whatever changes you want
– Perform simple sorting
– Query to find the location of information
– Looks similar to a workbook with no row numbers
3‐112
Views
3‐113
Report Generators
• Report generator – helps you quickly define formats of reports and what information you want to see in a report
• You can save report formats and generate reports at any time with up‐to‐date information
3‐114
Report Generators
3‐115
QBE Tools
• Query‐by‐example (QBE) tool – helps you graphically design the answer to a question
• “What driver most often delivers concrete to Triple A Homes?”
3‐116
QBE Tools
3‐117
SQL
• Structured query language (SQL) –
standardized fourth‐generation language found in most DBMSs
• Performs the same task as a QBE tool
– But uses a sentence structure instead of point‐
and‐click interface
• SQL is used mostly by IT people
3‐118
Application Generation Subsystem
• Application generation subsystem – contains facilities to help you develop transaction‐
intensive applications
– Data entry screen (called forms)
– Programming languages
• Used mostly by IT specialists
3‐119
Data Administration Subsystem
• Data administration subsystem – helps you manage the overall database environment
– Backup and recovery
– Security management
– Query optimization
– Concurrency control
– Change management
3‐120
Data Administration Subsystem
• Backup and recovery
– Periodically back up information
– Recover a database if a failure occurs
• Security management
– Who has access to what information
– Who can perform certain tasks (e.g., add, change, or delete) on information
3‐121
Data Administration Subsystem
• Query optimization
– Restructure physical view of information to optimize response times to queries
• Concurrency control
– What happens if two people makes changes to the same information at the same time?
3‐122
Data Administration Subsystem
• Change management
– What is the effect of structural changes to a database?
– What if you add a new column?
– What happens if you delete a column?
– What happens if you change a column’s attributes?
3‐123
DATA WAREHOUSES AND DATA MINING
• Data warehouses support OLAP and decision making
• Data warehouses do not support OLTP
• Data‐mining tools are the tools you use to work with a data warehouse
– DBMS software = database
– Data‐mining tools = data warehouse
3‐124
What Is a Data Warehouse?
• Data warehouse – logical collection of information – gathered from operational databases – used to create business intelligence that supports business analysis activities and decision‐making tasks
3‐125
Components of a Data Warehouse
9‐126
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Data Warehouse Summary
•
•
•
•
Multidimensional
Rows and columns
Also layers
Many times called hypercubes
3‐127
TheDatabaseApproachtoDataManagement
MULTIDIMENSIONAL DATA MODEL
The view that is showing is product versus region. If you rotate the cube 90 degrees, the face that will show is product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. Other views are possible.
Functions
• Online transaction processing (OLTP) – the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information
– Databases support OLTP
– Operational database – databases that support OLTP
3‐129
Functions
• Online analytical processing (OLAP) – the manipulation of information to support decision making
–
–
–
–
3‐130
Databases can support some OLAP
Data warehouses only support OLAP, not OLTP
Why?
Data warehouses are special forms of databases that support decision making
Online analytical processing (OLAP)
Supports multidimensional data analysis
Viewing data using multiple dimensions
Each aspect of information (product, pricing, cost, region, time period) is different dimension
E.g., how many washers sold in the East in June compared with other regions?
OLAP enables rapid, online answers to ad hoc queries
Data marts: Subset of data warehouse
Summarized or highly focused portion of firm’s data for use by specific population of users
Typically focuses on single subject or line of business
Data warehouse: Stores current and historical data from many core operational transaction systems
Consolidates and standardizes information for use across enterprise, but data cannot be altered
Data warehouse system will provide query, analysis, and reporting tools
Data Marts
• Data warehouses can support all of an organization’s information
• Data marts have subsets of an organizationwide data warehouse
• Data mart – subset of a data warehouse in which only a focused portion of the data warehouse information is kept
3‐133
Components of a Data Mart
9‐134
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Data Marts
3‐135
Object in Business Information Systems
Business Intelligence(BI): Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions
E.g., Harrah’s Entertainment analyzes customers to develop gambling profiles and identify most profitable customers
Principle tools include:
Software for database query and reporting
Online analytical processing (OLAP)
Data mining
More definition
UsingDatabasestoImproveBusiness
PerformanceandDecisionMaking
Data mining:
More discovery driven than OLAP
Finds hidden patterns, relationships in large databases and infers rules to predict future behavior
E.g., Finding patterns in customer data for one‐to‐one marketing campaigns or to identify profitable customers.
Types of information obtainable from data mining
Associations, Sequences, Classification
Clustering, Forecasting
Predictive analysis in Data Mining;
Uses data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events
E.g., Probability a customer will respond to an offer
Information Vs. Intelligence
3‐142
What Are Data‐Mining Tools?
• Data‐mining tools – software tools that you use to query information in a data warehouse
– Query‐and‐reporting tools
– Intelligence agents
– Multidimensional analysis tools
– Statistical tools
3‐143
What Are Data‐Mining Tools?
3‐144
Converging Disciplines
9‐145
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Query‐And‐Reporting Tools
• Query‐and‐reporting tools – similar to QBE tools, SQL, and report generators in the typical database environment
3‐146
Intelligent Agents
• Use various artificial intelligence tools such as neural networks and fuzzy logic to form the basis for “information discovery” and building business intelligence
• Help you find hidden patterns in information
3‐147
Multidimensional Analysis Tools
• Multidimensional analysis (MDA) tools –
slice‐and‐dice techniques that allow you to view multidimensional information from different perspectives
– Bring new layers to the front
– Reorganize rows and columns
3‐148
Statistical Tools
• Help you apply various mathematical models to the information stored in a data warehouse to discover new information
– Regression
– Analysis of variance
– And so on
3‐149
Enterprise Application Integration • “Re‐architecting” existing programs so that an intermediate layer, termed middleware, is developed between the applications and the databases
• Designed to make calls to the middleware layer rather than the other applications
• Streamlines maintenance process because changes to an application will not affect all the interfaces connected to it
© Gabriele Piccoli
Meta Data Operations
The EAI Approach
Legacy
Application
Legacy
Application
Database
2
Middleware
ERP
SCM
Database
1
SCM: Supply Chain Management
ERP: Enterprise Resource Planning
© Gabriele Piccoli
CRM Infrastructure
© Gabriele Piccoli
DSS Characteristics and Capabilities
DSS Components
Data Management Subsystem
•
•
•
•
DSS database DBMS Data directory Query facility A Web‐Based DSS Architecture
Expert Systems vs. DSS
Expert System
• Inject expert knowledge in to a computer system.
• Automate decision making.
• The decision environments have structure
• The alternatives and goals are often established in advance.
• The expert system can eventually replace the human decision maker.
Decision Support System
• Extract or gain knowledge from a computer system
• Facilitates decision making
• Unstructured environment
• Alternatives may not be fully realized yet
• Use goals and the system data to establish alternatives and outcomes, so a good decision can be made
Artificial Intelligence and Decision Support System
in Bussiness
are attached in Appendix
Webs, Documents are
Data Where House Too
WHAT CAN BUSINESSES LEARN FROM TEXT MINING?
Text mining
Extracts key elements from large unstructured data sets (e.g., stored e‐mails)
What challenges does the increase in unstructured data present for businesses?
How does text‐mining improve decision‐making?
What kinds of companies are most likely to benefit from text mining software? In what ways could text mining potentially lead to the erosion of personal information privacy? Web mining
Discovery and analysis of useful patterns and information from WWW
E.g., to understand customer behavior, evaluate effectiveness of Web site, etc.
Web content mining
Knowledge extracted from content of Web pages
Web structure mining
E.g., links to and from Web page
Web usage mining
User interaction data recorded by Web server
Databases and the Web
Many companies use Web to make some internal databases available to customers or partners
Typical configuration includes:
Web server
Application server/middleware/CGI scripts
Database server (hosting DBM)
Advantages of using Web for database access:
Ease of use of browser software
Web interface requires few or no changes to database
Inexpensive to add Web interface to system
Firms use the Web to make information from their
internal databases available to customers and partners
• Middleware and other software make this possible
• Database servers
• CGI(Computer Gateway Interface)
• Web interfaces provide familiarity to users and
savings over redesigning and rebuilding legacy
systems
LINKING INTERNAL DATABASES TO THE WEB
Users access an organization’s internal database through the Web using their desktop PCs and Web browser software.
ManagingDataResources
Establishing an information policy
Firm’s rules, procedures, roles for sharing, managing, standardizing data
Data administration: Firm function responsible for specific policies and procedures to manage data
Data governance: Policies and processes for managing availability, usability, integrity, and security of enterprise data, especially as it relates to government regulations
Database administration:
Defining, organizing, implementing, maintaining database; performed by database design and management group Nature and Quality of Data
Basic : True Data
Good: Many(File, Record)
Better : Organized(Database, Data Where house)
Best : Analysis, Intelligence( Data mining, Intelligence)
MANAGING THE INFORMATION RESOURCE
• Information is an organizational resource
• Just like people, capital, and equipment
• It must be managed effectively based on True data and Systems
3‐169
MANAGING THE INFORMATION RESOURCE
• Who should oversee your organization’s information resource?
– Chief information officer (CIO) – oversees an organization’s information resource
– Data administration – plans for, oversees the development of, and monitors the information resource
– Database administration – technical and operational aspects of managing information
3‐170
MANAGING THE INFORMATION RESOURCE
• Is information ownership a consideration?
– If you create information, you “own” it
– You will also share it with others
– Because you “own” it, you are responsible for its quality
3‐171
MANAGING THE INFORMATION RESOURCE
• How “clean” must your information be?
– Duplicate information (records) must be eliminated
– Inaccurate information must be corrected
– Information forms the basis of business intelligence
– If your business intelligence is bad, you will make poor decisions
3‐172
Ensuring data quality
More than 25% of critical data in Fortune 1000 company databases are inaccurate or incomplete
Most data quality problems stem from faulty input
Before new database in place, need to:
Identify and correct faulty data Establish better routines for editing data once database in operation
Data quality audit:
Structured survey of the accuracy and level of completeness of the data in an information system
Survey samples from data files, or
Survey end users for perceptions of quality
Data cleansing
Software to detect and correct data that are incorrect, incomplete, improperly formatted, or redundant
Enforces consistency among different sets of data from separate information systems
CREDIT BUREAU ERRORS
—BIG PEOPLE PROBLEMS
Assess the business impact of credit bureaus’ data quality problems for the credit bureaus, for lenders, for individuals.
Are any ethical issues raised by credit bureaus’ data quality problems? Analyze the people, organization, and technology factors responsible for credit bureaus’ data quality problems.
What can be done to solve these problems?
Data Mining as a Career Opportunity
• Knowledge of data mining can be a substantial career opportunity for you
– Query and Analysis and Enterprise Analytic Tools (Business Objects)
– Business Intelligence and Information Access tools (SAS)
– Many in Cognos (the data warehouse leader)
– PowerAnalyzer (Informatica)
SAS: System Analysis Scientist
3‐176
Review ?
Describe how a relational database organizes
data and compare its benefits
Identify and describe the principles of a database
management system.
Evaluate tools and technologies for providing
information from databases to improve business
performance and decision making.
CAN YOU…
Describe business intelligence and its role
Compare databases and data warehouses by OLTP and OLAP
Define 5 software components of a DBMS
3‐178
CAN YOU…
List/describe key characteristics of a data warehouse
Define 4 major types of data‐mining tools
List key considerations in managing information as a resource
3‐179
Appendix for business Intelligence
DSS: Decision Support Systems and AI: Artificial Intelligence
In Business
AI in Business
Some Commercial Applications
• Decision Support
• Expert Systems
• Information Retrieval
• Virtual Reality
• Robotics
I’m ready to do some business
Overview of AI
• Goal of AI
– develop computer systems that exhibit intelligence or simulate the ability to think
• AI pioneered by Computer Science
• But, AI involves a combination of
– Computer Science, Biology, Psychology, Linguistics, Mathematics,Engineering
What really is Intelligence?
• Specifically, what are the signs of Intelligent Behavior?
• Think about it for a while
Which of the following is the best example of intelligent behavior?
1. Ability to add numbers
2. Ability to see and recognize 25%
objects
3. Ability to adapt to surroundings
4. Ability to learn for mistakes
1
25% 25% 25%
2
3
4 10
What really is Intelligence?
• You are about to start an online chat (IM) with two entities:
– One entity is a human
– The other is a computer • After hours of conversation, you can not tell which entity is a computer.
• Does this mean the computer is Intelligent?
Intelligent Behavior
• What are some of the signs, attributes, or characteristics of Intelligent Behavior
Characteristics of Intelligent Behavior
1. Learn from experience & apply the knowledge
 Computer can automatically improve performance based on Experience
 Machine Learning
 Computational Learning
Characteristics of Intelligent Behavior
2. Handle complex situations
 Computer Systems can often handle complexity better than humans
 Consider a process control system that must simultaneous track 100 different system variables.
Characteristics of Intelligent Behavior
3. Solve problems when important information is missing
 Computer Systems can find patterns and deal with all sorts of missing information Characteristics of Intelligent Behavior
4. React quickly & correctly to new situations; Acquire & Apply Knowledge
 Here is where computers start to fail.
 Adapting to completely new situations is a problem for computer systems.
 Its very difficult to design a computer system that can combine, connect, and acquire knowledge to solve completely new problems 5.
6.
7.
8.
Characteristics of Intelligent Behavior
Determine what is important.
Exhibit creativity and imagination
Process visual information efficiently
Use reason to solve problems
 These are some other Characteristics that humans possess.
 Computer systems have a lot of catching up to do.
Which of the following do computer need to catch up on?
1. Determine what is important.
2. Exhibit creativity and 25%
imagination
3. Process visual information efficiently
4. Use reason to solve problems
1
25% 25% 25%
2
3
4 10
AI in Business
• AI continues to improve and evolve.
• Scientists and Engineers are pushing the envelope of what is possible.
• In Business, there is a better understanding of the capabilities of Intelligent Computer Systems
• It is important to know which types of problems are suited for humans, and which are suited for Computers. Human Intelligence vs. AI
Attribute
Human
Intelligence
Artificial
Intelligence
Use a variety of information
sources
High
High
Ability to acquire large amounts
of external info.
Medium
High
Ability to do rapid, accurate,
and complex calculations
Low
High
Ability to transfer information
rapidly
Low
High
Human Intelligence vs. AI
Attribute
Human
Intelligence
Artificial
Intelligence
Ability to use sensors or senses
High
Medium
Creativity or imagination
High
Low
Ability to learn from experience
High
Medium
Ability of be adaptive
High
Medium
AI: Application Domains AI: Commercial Domains
• Decision Support
– Integrating the advantages of AI with Human Intelligence.
– More intelligent Interfaces
– More intelligent processing for massive data
AI: Commercial Domains
• Information Retrieval
– Automatic simplification for massive data
– Natural language technology: computer can speak our language.
AI: Commercial Domains
• Virtual Reality
– Better training environment from pilots to doctors
• Robotics
– Bringing the precision and speed of computers into the physical world
– Goes beyond manufacturing and assembly lines; Baggage Inspection, Bomb Removal, Replacement Limbs.
Expert Systems
• The idea is to inject expert knowledge in to a computer system.
• The primary purpose is to automate decision making.
• The decision environments have structure
• The alternatives and goals are often established in advance.
Expert Systems vs. DSS
Expert System
• Inject expert knowledge in to a computer system.
• Automate decision making.
• The decision environments have structure
• The alternatives and goals are often established in advance.
• The expert system can eventually replace the human decision maker.
Decision Support System
• Extract or gain knowledge from a computer system
• Facilitates decision making
• Unstructured environment
• Alternatives may not be fully realized yet
• Use goals and the system data to establish alternatives and outcomes, so a good decision can be made
What is the biggest difference between a Decision Support System and an MIS
1. DSS’s are interactive and ad hoc
2. DSS’s focus on transforming 25%
information into knowledge
3. MIS’s focus on transforming data into information
4. All of the above
1
25% 25% 25%
2
3
4 10
What is the biggest difference between an MIS and TPS
1. in a TPS there is no analysis
2. an MIS focuses on reports
3. an TPS focuses on updating a database
4. All of the above
25% 25% 25% 25%
1
2
3
4 10
How is the analysis different for a MIS vs. DSS
1. MIS: Analysis involves computing aggregates
2. MIS: Analysis involves creating useful charts and graphs
3. DSS: Connects information with decisions
4. DSS: Builds scenarios
25% 25% 25% 25%
1
2
3
4 10
Some Interesting Applications of Expert Systems
• Triage – Medical Diagnosis (Medical Expert System)
– User enters symptoms
– System makes diagnosis
– Doctors collective expertise is captured in the system
• Patriot Missile Guidance System
– Radar identifies Scud missile
– System steers Patriot missile to it intercepts Scud missile
– Laws of physics, expert knowledge about missile trajectory is captured in the system
• Financial Decision Making – Currency Trading
Expert System Categories
• Decision Making
– buy/sell
– risk/no risk
– rain/ no rain
• Trouble Shooting / Diagnosis
• Selection/Classification
– Tell me what you see, expert system figures out what it really is...
• Process Monitoring and Control
– Robot control, assembly‐line control, missile control
– Hello welcome to Dell; • Design/Configuration
how can I help you?
– Specify what you want, – Suddenly an idiot seems expert system figures out specifically how to do it.
like an expert.
Expert System Components
Expert System Software
User
Interface
user
Engine
Knowledge base
Expert System Components
Expert System Software
User
Interface
Engine
Knowledge base
user
Expert System Development Process
Knowledge Acquisition Program
Expert or Knowledge Engineer
Raw Data or Facts
Expert System Components
Non‐
expert
Robot
Missile
Expert System Software
Interface
Engine
Knowledge base
Expert System Development Process
Knowledge Acquisition Program
Expert or Knowledge Engineer
Raw Data or Facts
Expert System vs. DSS
Someone with Knowledge
Decision Maker
DSS Software
Model Base
User
Interface
Analytical & Statistical Models
Engine
DSS Processes
Data Management Extraction, Generation, Validation, etc.
Raw Data or Facts