Download MIS9eCC03

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information security wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Data vault modeling wikipedia , lookup

Relational model wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Database model wikipedia , lookup

Transcript
Chapter 3
Databases and Data
Warehouses
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
STUDENT LEARNING
OUTCOMES
1.
2.
3.
4.
5.
List and describe the key characteristics of a
relational database.
Define the 5 software components of a DBMS.
List and describe the key characteristics of a
data warehouse.
Define the 5 major types of data-mining tools.
List key considerations in information
ownership.
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
DID YOU KNOW CDs COME FROM
DEAD DINOSAURS?
In 2010, more than half of all music was in
digital form; physical music will never again be
the norm
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Questions
1.
2.
3.
How many digital music devices do
you use?
How many CDs do you own that you
bought from a retail store?
When was the last time you bought
music on a physical medium such as
a CD?
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
INTRODUCTION



Business intelligence (BI) –
collective information that gives you
the ability to make effective, important,
and strategic business decisions
Analytics – the science of fact-based
decision making
Both are huge in today’s business world
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
INTRODUCTION
•
•
Businesses use many IT tools to
manage and organize information
Online transaction processing
(OLTP) – gathering and processing
information and updating existing
information to reflect the processed
information
•
Online analytical processing
(OLAP) – manipulation of information
to support decision making
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
INTRODUCTION

OLTP




Supports operational processing
Sales orders, accounts receivable, etc
Supported by operational databases
& DBMSs
OLAP


Helps build business intelligence
Supported by data warehouses and
data-mining tools
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
OLTP, OLAP, and Business
Intelligence
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
CHAPTER ORGANIZATION
1.
Relational Database Model
–
2.
Database Management System Tools
–
3.
Learning Outcome #2
Data Warehouses and Data Mining
–
4.
Learning Outcome #1
Learning Outcomes #3 & #4
Information Ownership
–
Learning Outcome #6
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
RELATIONAL DATABASE
MODEL


Database – collection of information
that you organize and access
according to the logical structure of
the information
Relational database – series of
logically related two-dimensional
tables or files for storing information

Relation = table = file

Most popular database model
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Database Characteristics




Collections of information
Created with logical structures
Include logical ties within the
information
Include built-in integrity constraints
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Database – Collection of
Information
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Database – Created with Logical
Structures

Data dictionary – contains the logical
structure for the information in a
database
Before you can enter information
into a database, you must define
the data dictionary for all the
tables and their fields. For
example, when you create the
Truck table, you must specify that
it will have three pieces of
information and that Date of
Purchase is a field in Date
format.
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Database – Logical Ties within the
Information


Primary key – field (or group of fields)
that uniquely describes each record
Foreign key – primary key of one file that
appears in another file
Customer Number
is the primary key
for Customer and
appears in Order as
a foreign key
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Database – Logical Ties within the
Information
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Databases – Built-In Integrity
Constraints



Integrity constraints – rules that
help ensure the quality of information
Data dictionary, for example, defines
type of information – numeric, date,
and so on
Foreign keys – must be found as
primary keys in another file

E.G., a Customer Number in the Order
Table must also be present in the
Customer Table
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
DATABASE MANAGEMENT SYSTEM
TOOLS

Database
management
system (DBMS) –
helps you specify the
logical requirements
for a database and
access and use the
information in a
database
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
5 Components of a DBMS
1.
2.
3.
4.
5.
DBMS engine
Data definition subsystem
Data manipulation subsystem
Application generation subsystem
Data administration subsystem
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
DBMS Engine
•
•
•
DBMS engine – accepts logical requests
and converts them into the physical
equivalents, and access the database and
data dictionary on a storage device
Physical view – how information is
physically arranged, stored, and accessed
on a storage device
Logical view – how you need to arrange
and access information to meet your needs
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Definition Subsystem


Data definition subsystem – helps
you create and maintain the data
dictionary and structure of the files in
a database
The data dictionary helps you define…





Field names
Data types (numeric, etc)
Form (do you need an area code)
Default value
Is an entry required, etc
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Manipulation Subsystem



Data manipulation subsystem –
helps you add, change, and delete
information in a database and query
it to find valuable information
Most often your primary interface
Includes views, report generators,
query-by-example tools, and
structured query language
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
View

View – allows you to see the contents
of a database file, make changes, and
query it to find information
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Report Generator

Report
generator –
helps you quickly
define formats of
reports and what
information you
want to see in a
report
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Query-by-Example Tool

QBE tool – helps you graphically
design the answer to a question
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Structured Query Language



SQL – standardized fourth-
generation query language found in
most DBMSs
Sentence-structure equivalent to QBE
Mostly used by IT professionals
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Application Generation
Subsystem


Application generation
subsystem – contains facilities to
help you develop transactionintensive applications
Mainly used by IT professionals
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Administration Subsystem
•
Data administration subsystem –
helps you manage the overall
database environment by providing
facilities for…
–
–
–
–
–
–
Backup and recovery
Security management
Query optimization
Reorganization
Concurrency control
Change management
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Administration Subsystem

Backup and recovery – for backing
up information and restarting
(recovering) from a failure

Backup – copy of information on a
computer

Recovery – process of reinstalling the
backup information in the even the
information was lost
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Administration Subsystem



Security management – for CRUD
access – create, read, update, and
delete
Query optimization – to minimize
response times for large, complex
queries
Reorganization – for physically
rearranging the structure of the
information according to how you
most often access it
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Administration Subsystem


Concurrency control – what happens
if two people attempt to make
changes to the same record
Change management – how will
structural changes impact the overall
database
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
DATA WAREHOUSES AND DATA
MINING


Help you build and work
with BI and some forms
of knowledge
Data warehouse –
collection of information
(from many places) that
supports business
analysis activities and
decision making
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Warehouse
Characteristics

Multidimensional


Rows, columns, and layers
Support decision making, not
transaction processing


Contain summaries of information
Not every detail
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
The Tool Set of the Analytics
Professional

Data-mining tools – software tools
you use in a data warehouse
environment





Query-and-reporting tools
Artificial intelligence
Multidimensional analysis tools
Digital dashboards
Statistical tools
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
The Tool Set of the Analytics
Professional
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
The Tool Set of the Analytics
Professional
•
•
•
Query-and-reporting tools –
similar to QBE tools, SQL, and report
generators
Artificial intelligence – tools to help
you “discover” information and
trends (more in Chapter 4)
Multidimensional analysis (MDA
tools) – slice-and-dice techniques
for viewing multidimensional
information
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
The Tool Set of the Analytics
Professional
•
Digital dashboard – displays key
information on a computer screen
tailored to the needs and wants of an
individual
•
Key performance indicator (KPI) –
most essential information used in any
analytics initiative
•
Statistical tools – for applying
mathematical models to data
warehouse information
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Digital Dashboard
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
The Analytics Life Cycle
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Extraction, Transformation,
and Loading (ETL)

ETL is a three-step process
1.
2.
3.
Extract needed information from its
source
Transform the data into a standardized
format
Load the transformed data into a data
warehouse
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Marts

Data mart – subset of a data
warehouse in which only a focused
portion of the data warehouse
information is kept
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Data Warehouse
Considerations




Do you really need one, or does your
database environment support all
your functions?
Do all employees need a big data
warehouse or a smaller data mart?
How up-to-date must the information
be?
What data-mining tools do you need?
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
INFORMATION OWNERSHIP


Information is a resource you must
manage and organize to help the
organization meet its goals and
objectives
You need to consider



Strategic management support
Sharing information with responsibility
Information cleanliness
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Strategic Management Support
•
•
•
•
CIO – every aspect of an
organization’s information resource
CTO – the underlying IT
infrastructure and user-facing
technologies
CSO – technical aspects for
security of information
CPO – information is used in an
ethical way
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Strategic Management Support
•
•
•
2 others in information management
Data administration – function
that plans for, oversees the
development of, and monitors the
information resource
Database administration –
function responsible for the more
technical and operational aspects of
managing organizational information
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Sharing Information


Everyone can share – while not
consuming – information
But someone must “own” it by
accepting responsibility for its quality
and accuracy
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin
Information Cleanliness




Related to ownership and
responsibility for quality and accuracy
No duplicate information
No redundant records with slightly
different data, such as the spelling of
a customer name
GIGO – if you have garbage
information you get garbage
information for decision making
© McGraw-Hill Companies, Inc., McGraw-Hill/Irwin