Download Lecture 3

Document related concepts

Big data wikipedia , lookup

Concurrency control wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Ralph M. Stair | George W. Reynolds
Chapter 3
Database Systems and
Applications
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
Principles and Learning Objectives: Data
Management and Modeling
• Data management and modeling are key
aspects of organizing data and information
– Define general data management concepts
and terms, highlighting the advantages of the
database approach to data management
– Describe logical and physical database
design considerations, the function of data
centers, and the relational database model
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
2
Principles and Learning Objectives:
Database Support in Decision Making
• A well-designed and well-managed
database is an extremely valuable tool in
supporting decision making
– Identify the common functions performed by
all database management systems, and
identify popular database management
systems
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
3
Principles and Learning Objectives:
Evolving Database Applications
• The number and types of database
applications will continue to evolve and
yield real business benefits
– Identify and briefly discuss business
intelligence, data mining, and other database
applications
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
4
Why Learn About Database Systems and
Applications?
• A huge amount of data is captured for
processing by computers every day
• Learning about database systems and
applications can help you make the most
effective use of information
• Databases and applications to extract and
analyze valuable information can help you
succeed in your career
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
5
Introduction
• Database: an organized collection of data
• A database management system (DBMS)
is a group of programs that:
– Manipulate the database
– Provide an interface between the database
and its users and other application programs
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
6
Data Management
• Without data and the ability to process it:
– An organization could not successfully
complete most business activities
• Data consists of raw facts
• Data must be organized in a meaningful
way to transform it into useful information
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
7
The Hierarchy of Data
• A bit (binary digit) represents a circuit that
is either on or off
• A byte is made up of eight bits
– Each byte represents a character
• Field: a name, number, or combination of
characters that describes an aspect of a
business object or activity
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
8
The Hierarchy of Data (cont’d.)
• Record: a collection of related data fields
• File: a collection of related records
• Database: a collection of integrated and
related files
• Hierarchy of data: bits, characters, fields,
records, files, and databases
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
9
An Example of Hierarchy of Data
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
10
Data Entities, Attributes, and Keys
• Entity: a person, place, or thing for which
data is collected, stored, and maintained
• Attribute: a characteristic of an entity
• Data item: the specific value of an attribute
• Primary key: a field or set of fields that
uniquely identifies the record
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
11
Keys and Attributes
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
12
The Database Approach
• Traditional approach to data management
– Each distinct operational system used data
files dedicated to that system
• Database approach to data management
– Information systems share a pool of related
data
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
13
Database Approach to Data Management
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
14
Data Centers, Data Modeling and
Database Characteristics
• Considerations when building a database
– Content: what data should be collected and at
what cost?
– Access: what data should be provided to
which users and when?
– Logical structure: how should data be
arranged so that it makes sense?
– Physical organization: where should data be
physically located?
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
15
Data Modeling
• Data model: a diagram of data entities and
their relationships
• Enterprise data modeling: data modeling
done at the level of the entire enterprise
• Entity-relationship (ER) diagrams: data
models that use basic graphical symbols
to show the organization of and
relationships between data
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
16
Entity-Relationship (ER) Diagram for a
Customer Order Database
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
17
Relational Database Model
• Relational model: a simple but highly
useful way to organize data into
collections of two-dimensional tables
called relations
• Relational model databases include:
– Oracle, IBM DB2, Microsoft SQL Server,
Microsoft Access, MySQL, and Sybase
• Domain: range of allowable values for a
data attribute
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
18
Relational Database Model (cont’d.)
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
19
Manipulating Data
• Selecting: eliminating rows according to
certain criteria
• Projecting: eliminating columns in a table
• Joining: combining two or more tables
• Linking: combining two or more tables
through common data attributes to form a
new table with only the unique data
attributes
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
20
Simplified ER Diagram
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
21
Linking Data Tables to Answer an Inquiry
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
22
Data Cleansing
• Also called data cleaning or data
scrubbing
• The process of detecting and then
correcting or deleting incomplete,
incorrect, inaccurate, irrelevant records
that reside in a database
• The cost of performing data cleansing can
be quite high
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
23
Database Management Systems
• Creating and implementing the right
database system ensures that the
database will support both business
activities and goals
• Capabilities and types of database
systems vary considerably
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
24
Overview of Database Types
• Single-user DBMS
– Installed on a personal computer and meant
for a single user
– Examples: Microsoft Access and InfoPath,
Lotus Approach, and Personal Oracle
• Multiple-user DBMS
– Allows dozens or hundreds of people to
access the same system at the same time
– Vendors: Oracle, Microsoft, Sybase, and IBM
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
25
Overview of Database Types (cont’d.)
• Flat file
– Simplest database program
– The records have no relationship to one
another
– Store and manipulate a single table or file
– Examples: OneNote and Evernote
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
26
SQL Databases
• SQL: a special-purpose programming
language for accessing and manipulating
data stored in a relational database
• SQL databases conform to ACID
properties:
– Atomicity, consistency, isolation, and durability
• 1986: SQL was adopted by ANSI as the
standard query language for relational
databases
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
27
Table 3.1 Examples of SQL Commands
SQL Command
Description
SELECT ClientName, Debt FROM
Client WHERE Debt > 1000
This query displays all clients (ClientName) and the amount they
owe the company (Debt) from a database table called Client for
clients who owe the company more than $1,000 (WHERE Debt
> 1000).
SELECT ClientName,
ClientNum, OrderNum FROM
Client, Order WHERE
Client.ClientNum=Order.ClientNum
This command is an example of a join command that combines
data from two tables: the Client table and the Order table (FROM
Client, Order). The command creates a new table with the client
name, client number, and order number (SELECT ClientName,
ClientNum, OrderNum). Both tables include the client number,
which allows them to be joined. This ability is indicated in the
WHERE clause, which states that the client number in the Client
table is the same as (equal to) the client number in the Order table
(WHERE Client.ClientNum=Order.ClientNum).
GRANT INSERT ON Client to
Guthrie
This command is an example of a security command. It allows
Bob Guthrie to insert new values or rows into the Client table.
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
28
NoSQL Databases
• A database designed to store and retrieve
data in a manner that does not rigidly
enforce the atomic conditions associated
with the relational database model
– Provides faster performance and greater
scalability
• Examples
– Cassandra used by Facebook
– DynamoDB used by Amazon
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
29
Visual, Audio, and Other Database
Systems
• Visual databases store images of charge
slips, X-rays, and vital records
– Images can be stored in some objectrelational databases or special-purpose
database systems
• Spatial databases provide location-based
services
– Maps are embedded into a Web site’s Web
applications and operational systems
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
30
Database Activities
•
•
•
•
Providing a user view of the database
Adding and modifying data
Storing and retrieving data
Manipulating the data and generating
reports
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
31
Providing a User View
• Schema: a description of the entire
database
• A schema can be part of the database or a
separate schema file
• The DBMS can reference a schema to find
where to access the requested data in
relation to another piece of data
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
32
Creating and Modifying the Database
• Data definition language (DDL)
– A collection of instructions and commands
used to define and describe data and
relationships in a specific database
– Allows the database’s creator to describe data
and relationships that are to be contained in
the schema
• Data dictionary: a detailed description of
all the data used in the database
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
33
Data Definition Language
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
34
Data Dictionary Entry
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
35
Storing and Retrieving Data
• When an application program needs data,
it requests the data through the DBMS
• Concurrency control deals with the
situation in which two or more users or
applications need to access the same
record at the same time
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
36
Logical and Physical Access Paths
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
37
Manipulating Data and Generating
Reports
• Query by Example (QBE) is a visual
approach to developing database queries
or requests
• Data manipulation language (DML): a
specific language, provided with a DBMS
– Allows users to access and modify the data,
to make queries, and to generate reports
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
38
Manipulating Data and Generating
Reports (cont’d.)
• Once a database has been set up and
loaded with data, it can produce reports,
documents, and other outputs
• A DBMS can produce a wide variety of
documents, reports, and other output that
can help organizations achieve their goals
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
39
Database Administration
• DBA: a skilled and trained IS professional
– Works with users to define their data needs
– Applies database programming languages to
craft a set of databases to meet those needs
– Tests and evaluates databases
– Implements changes to improve their
databases’ performance
– Assures that data is secure from unauthorized
access
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
40
Database Administration (cont’d.)
• Data administrator: a nontechnical position
responsible for defining and implementing
consistent principles for a variety of data
issues
• The data administrator can be a high-level
position reporting to top-level managers
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
41
Table 3.2 Popular Database Management
Systems
Open-Source
Relational
DBMS
Relational DBMS for
Individuals and
Workgroups
Relational DBMS for
Workgroups and
Enterprise
NoSQL DBMS
MySQL
Microsoft Access
Oracle
Mongo DB
PostgreSQL
IBM Lotus Approach
IBM DB2
Cassandra
MariaDB
Google Base
Sybase
Redis
SQL Lite
OpenOffice Base
Teradata
CouchDB
Microsoft SQL Server
Progress OpenEdge
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
42
Popular Database Management Systems
• Database as a Service (DaaS)
– The database is stored on a service provider’s
servers
– The database is accessed by the client over a
network, typically the Internet
– Database administration is handled by the
service provider
• Example of DaaS: Amazon Relational
Database Service (Amazon RDS)
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
43
Using Databases with Other Software
• DBMSs can act as front-end or back-end
applications
– Front-end applications interact directly with
people
– Back-end applications interact with other
programs or applications
• Spin-off database applications include:
– Big data, data warehouses and data marts,
and business intelligence
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
44
Big Data
• Extremely large and complex data
collections
– Traditional data management software,
hardware, and analysis processes are
incapable of dealing with them
• Three characteristics of big data
– Volume
– Velocity
– Variety
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
45
Table 3.3 Big Data Generators
Source
Magnitude of Data Generated
Large Hadron particle
accelerator at CERN
40 terabytes of data per second
Commercial aircraft engines
More than 1 petabyte per day of sensor data
Cell phones
More than 5 billion people worldwide are making
cell phone calls, exchanging text messages, and
accessing Web sites
YouTube
48 hours of video uploaded per minute
Facebook
100 terabytes uploaded per day
Twitter
500 million tweets per day
RFID tags
1,000 times the volume of data generated by bar codes
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
46
Challenges of Big Data
• How to choose what subset of the data to
store
• Where and how to store the data
• How to find the nuggets of data that are
relevant to the decision making at hand
• How to derive value from the relevant data
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
47
In-Memory Databases
• A database management system that
stores the entire database in random
access memory (RAM)
• Enable the analysis of big data and other
challenging data-processing applications
• Perform best on multiple multicore CPUs
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
48
Data Warehouses and Data Marts
• Data warehouse: a large database that
collects business information from many
sources in the enterprise in support of
management decision making
• ETL process
– Extract
– Transform
– Load
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
49
Elements of a Data Warehouse
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
50
Data Warehouses and Data Marts
(cont’d.)
• Data mart: a subset of a data warehouse
that is used by small- and medium-sized
businesses and departments within large
companies to support decision making
• A specific area in the data mart might
contain greater detailed data than the data
warehouse
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
51
Business Intelligence
• A broad range of technologies and
applications
– Enabling an organization to transform mostly
structured data obtained from information
systems to perform analysis, generate
information, and improve the decision making
of the organization
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
52
Business Intelligence (cont’d.)
• Technologies include:
– Data mining
– Online analytical processing
– Predictive analytics
– Data visualization
– Competitive intelligence
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
53
Data Mining
• An information-analysis tool that involves
the automated discovery of patterns and
relationships in a data warehouse
• Provides bottom-up, discovery-driven
analysis
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
54
Online Analytical Processing (OLAP)
• A form of analysis that allows users to
explore data from a number of
perspectives, enabling a style of analysis
known as “slicing and dicing”
• Provides top-down, query-driven data
analysis
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
55
Table 3.6 Comparison of OLAP and Data
Mining
Characteristic
OLAP
Data Mining
Purpose
Supports data analysis and
decision making
Supports data analysis and
decision making
Type of analysis
supported
Top-down, query-driven data
analysis
Bottom-up, discovery-driven
data analysis
Skills required of user
Must be very knowledgeable
of the data and its business
context
Must trust in data-mining tools
to uncover valid and
worthwhile hypotheses
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
56
Predictive Analysis
• Also called predictive analytics
• A form of data mining that combines
historical data with assumptions about
future conditions to predict outcomes of
events, e.g., future product sales or the
probability that a customer will default on a
loan
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
57
Data Visualization
• In analyzing data, charts and graphs make
it easier to:
– See trends and patterns
– Identify opportunities for further analysis
• Software examples
– Excel and SAS Visual Analytics
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
58
Data Visualization: Social Graph Analysis
• A data visualization technique in which
data is represented as networks
– Vertices are the individual data points (social
network users)
– Edges are the connections among the
vertices
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
59
Data Visualization: Key Performance
Indicators (KPIs) and Dashboards
• KPIs: quantifiable measurements that
assess progress toward organizational
goals and reflect the critical success
factors of an organization
• Dashboard: a data visualization tool that
displays the current status of the key
performance indicators (KPIs) for an
organization
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
60
Competitive Intelligence
• Competitive intelligence encompasses
information about competitors and the
ways that knowledge affects strategy,
tactics, and operations
• Counterintelligence describes the steps an
organization takes to protect information
sought by “hostile” intelligence gatherers
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
61
Summary – Principle 1
• Data is one of the most valuable resources
that a firm possesses
• An entity is an object for which data is
collected, stored, and maintained
• Database considerations: content, access,
logical structure, and physical organization
• The relational model places data in twodimensional tables
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
62
Summary – Principle 2
• A DBMS is a group of programs used as
an interface between a database and its
users and other application programs
• DBMS basic functions include:
– Providing user views
– Creating and modifying the database
– Storing and retrieving data
– Manipulating data and generating reports
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
63
Summary – Principle 2 (cont’d.)
• After a DBMS has been installed, the
database can be accessed, modified, and
queried via a data manipulation language
• A database administrator (DBA) plans,
designs, creates, operates, secures,
monitors, and maintains databases
• Database as a Service (DaaS) is a new
form of database service
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
64
Summary – Principle 3
• “Big data” is the term used to describe
enormous and complex data collections
• Data warehouses are relational DBMSs
specifically designed to support
management decision making
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
65
Summary – Principle 3 (cont’d.)
• Data mining allows the automated
discovery of patterns and relationships in a
data warehouse
• Counterintelligence describes the steps an
organization takes to protect information
sought by “hostile” intelligence gatherers
© 2016 Cengage Learning®. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
66