Download Chapter 13

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database model wikipedia , lookup

by Coby
by Coby
California, Santa
Santa Barbara
College College
Data and Databases
Study Objectives
The need for data collection and storage
Methods of storing data and the interrelationship between storage and
The differences between batch processing and real-time processing
The importance of databases and the historical progression from flat-file
databases to relational databases
The need for normalization of data in a relational database
Data warehouse and the use of a data warehouse to analyze data
The use of OLAP and data mining as analysis tools
Distributed databases and advantages of the use of distributed data
Controls for data and databases
Ethical issues related to data collection and storage, and their use in IT
Real World
Think about the volume of sales transactions
that occur on the Websites of large Internet retailers such
as L.L. Bean, Lands’ End, and J.Crew. These companies each process an average of
approximately 120,000 transactions each day on their Websites. For each of these
transactions, important data must be collected about the customer, location, payment,
and the items sold.
Even more overwhelming is the volume of sales transactions that are processed
by Wal-Mart on any given day. In addition to its Web-based sales, consider Wal-Mart’s
thousands of retail centers with several check-out lines at each location and long hours
of operation. Think about the number of accountants and computers that might be
required to manage all of the related records. It is no wonder that Wal-Mart has one of
the largest databases of any business organization in the world.
The Wal-Mart database continually grows with new transactions. Some estimate
that Wal-Mart adds 1 billion rows of data per day. In addition to the size of the
database, it is also growing faster. The company attaches RFID chips to merchandise
so that inventory purchases, movement to stores, and sales are tracked in real time.
Since the data for these events get added to the database so quickly, the database
grows and becomes more useful for immediate analysis. This allows Wal-Mart to more
quickly analyze and forecast inventory needs.
The Need for Data Collection and Storage
Data are the set of facts collected from transactions,
whereas information is the interpretation of data that have
been processed.
Main reasons to store transaction data:
1. To complete transactions from beginning to end.
2. To follow up with customers or vendors and to expedite
future transactions.
3. To create accounting reports and financial statements.
4. To provide feedback to management.
SO 1 The need for data collection and storage
The Need for Data Collection and Storage
Typical storage and processing techniques:
1. The storage media types for data: sequential and random
2. Methods of processing data: batch and real time
3. Databases and relational databases
4. Data warehouses, data mining, and OLAP
5. Distributed data processing and distributed databases
SO 1 The need for data collection and storage
The Need for Data Collection and Storage
Concept Check
Which of the following best describes the relationship
between data and information?
a. Data are interpreted information.
b. Information is interpreted data.
c. Data are more useful than information in decision
d. Data and information are not related.
SO 1 The need for data collection and storage
Storing and Accessing Data
Data Storage Terminology
► Character
► Record
► Field
► File
► Database
Exhibit 13-1
Data Hierarchy
SO 2 Methods of storing data and the interrelationship
between storage and processing
Storing and Accessing Data
Data Storage Media
► Magnetic tape
► Sequential access
► Random Access
Early Days of
Modern IT
SO 2 Methods of storing data and the interrelationship
between storage and processing
Storing and Accessing Data
Concept Check
A character is to a field as
a. water is to a pool.
b. a pool is to a swimmer.
c. a pool is to water.
d. a glass is to water.
SO 2 Methods of storing data and the interrelationship
between storage and processing
Storing and Accessing Data
Concept Check
Magnetic tape is a form of
a. direct access media.
b. random access media.
c. sequential access media.
d. alphabetical access media.
SO 2 Methods of storing data and the interrelationship
between storage and processing
Data Processing Techniques
Exhibit 13-2
Comparison of Batch and
Real-Time Processing
SO 3 The differences between batch processing and real-time processing
Data Processing Techniques
Concept Check
Which of the following is not an advantage of using real-time
data processing?
a. Quick response time to support timely record keeping
and customer satisfaction
b. Efficiency for use with large volumes of data
c. Provides for random access of data
d. Improved accuracy due to the immediate recording of
SO 3 The differences between batch processing and real-time processing
Data stored in a form that allows the data to be easily
accessed, retrieved, manipulated, and stored.
Exhibit 13-3
Traditional FileOriented Approach
 Data
 Concurrency
SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Exhibit 13-3
Database Approach
 One-to-One
Database Management System (DBMS) is
software that manages the database and
controls the access and use of data by
individual users and applications.
 One-to-Many
 Many-to-Many
SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
The History of Databases
Exhibit 13-4
Database Table
Flat File Database Model
► 1950s and 1960s
Large volumes of similar
Single record not easily
retrieved or stored
► Text format, sequential order
► Sequential processing
SO 4
The History of Databases
Hierarchical Database Model
► Inverted tree structure
► Parent–child, represent one-to-many relationships
► Record pointer
Exhibit 13-5
Linkages in a Hierarchical
SO 4
The History of Databases
Network Database Model
► Inverted tree structure
► More complex relationship linkages by use of shared
► Not very popular, rarely used
SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
The History of Databases
Relational Database Model
► Developed in 1969
► Stores data in two-dimensional tables
► Most widely used database structure today
► Examples include; IBM DB2, Oracle Database, and
Microsoft Access ®
SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Concept Check
If a company stores data in separate files in its different
departmental locations and is able to update all files
simultaneously, it would not have problems with
a. attributes.
b. data redundancy.
c. industrial espionage.
d. concurrency.
SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Concept Check
When the data contained in a database are stored in large,
two-dimensional tables, the database is referred to as a
a. flat file database.
b. hierarchical database.
c. network database.
d. relational database.
SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
Concept Check
Database management systems are categorized by the data
structures they support. In which type of database
management system is the data arranged in a series of
a. Network
b. Hierarchical
c. Relational
d. Sequential
SO 4 The importance of databases and the historical progression
from flat-file databases to relational databases
The Need for Normalized Data
Relational databases consist of several small tables. Small
tables can be joined in ways that represent relationships
among the data.
Exhibit 13-6
Relational Database in
Microsoft Access
Bolded field is the
primary key.
SO 5 The need for normalization of data in a relational database
The Need for Normalized Data
Relational database has flexibility in
retrieving data. Structured query
language (SQL) has become the
industry standard.
SELECT Customers.CustomerID, Customers.CompanyName,
Orders.OrderID, Orders.ShippedDate FROM Customers INNER
JOIN Orders ON Customers.CustomerID Orders.CustomerID;
Exhibit 13-7
Relational Database in
Microsoft Access
SO 5
The Need for Normalized Data
The process of converting data into tables that meet the
definition of a relational database is called data
► Seven rules of data normalization, additive.
► Most relational databases are in third normal form.
► First three rules of data normalization are:
1. Eliminate repeating groups
2. Eliminate redundant data
3. Eliminate columns not dependent on primary key.
SO 5 The need for normalization of data in a relational database
The Need for Normalized Data
Trade-offs in Database Storage
Relational database
Not most efficient way to store data that will be
used in other ways.
Most organizations are willing to accept less
transaction processing efficiency for better query
SO 5 The need for normalization of data in a relational database
The Need for Normalized Data
Concept Check
Which of the following statements is not true with regard to
a relational database?
a. It is flexible and useful for unplanned, ad hoc queries.
b. It stores data in tables.
c. It stores data in a tree formation.
d. It is maintained on direct access devices.
SO 5 The need for normalization of data in a relational database
Use of a Data Warehouse to Analyze Data
Management often needs data from several fiscal periods
from across the whole organization.
Exhibit 13-8
The Data Warehouse and
Operational Databases
SO 6 Data warehouse and the use of a data warehouse to analyze data
Use of a Data Warehouse to Analyze Data
Management often needs data from several fiscal periods
from across the whole organization.
► Build the data warehouse
► Identify the data
► Standardize the data
► Cleanse, or scrub, the data
► Upload the data
SO 6 Data warehouse and the use of a data warehouse to analyze data
Use of a Data Warehouse to Analyze Data
Concept Check
A collection of several years’ nonvolatile data used to support
strategic decision-making is a(n)
a. operational database.
b. data warehouse.
c. data mine.
d. what-if simulation.
SO 6 Data warehouse and the use of a data warehouse to analyze data
Data Analysis Tools
Data mining is the process of searching for identifiable
patterns in data that can be used to predict future behavior.
Online Analytical Processing (OLAP) is a set of software
tools that allow online analysis of the data within a data
warehouse. Analytical methods in OLAP usually include:
1. Drill down
4. Time series analysis
2. Consolidation
5. Exception reports
3. Pivoting
6. What-if simulations
SO 7 The use of OLAP and data mining as analysis tools
Data Analysis Tools
Concept Check
Data mining would be useful in all of the following situations
a. identifying hidden patterns in customers’ buying habits.
b. assessing customer reactions to new products.
c. determining customers’ behavior patterns.
d. accessing customers’ payment histories.
SO 7 The use of OLAP and data mining as analysis tools
Distributed Data Processing
Early days
 Centralized processing
 Centralized databases
Today’s IT Environment
 Distributed data processing (DDP)
 Distributed databases (DDB)
SO 8 Distributed databases and advantages of the use of distributed data
McDonald’s has restaurants, warehouses, and
offices located throughout the world; yet its
corporate headquarters is in Oakbrook, Illinois. If McDonald’s management
decided that all data, including prices, must be stored in a database at
corporate headquarters, what would have to happen when you order a
cheeseburger at a McDonald’s in Los Angeles? The cash register system
would have to read pricing data from the database in Oakbrook, Illinois. This
would be inefficient for several reasons. First, each McDonald’s restaurant
would be trying to read the same database simultaneously in order to fill
customer orders all around the world. Each of the McDonald’s restaurants
would need to be networked to that data in Illinois and would need to be able
to read price data quickly in order to process the sale. This would generate
so much network traffic that it would very likely overwhelm the network and
computer system. In addition, if prices are stored only at corporate
headquarters, it would become more difficult for each location to set its own
prices. Certainly, it would be much more efficient for McDonald’s to maintain
pricing data at the local restaurants or in regional centers.
Real World
SO 8
Distributed Data Processing
Distributing the processing and data offers the following
1. Reduced hardware cost
2. Improved responsiveness
3. Easier incremental growth
4. Increased user control and user involvement
5. Automatic integrated backup
The most popular type of distributed system is a
client/server system.
SO 8 Distributed databases and advantages of the use of distributed data
Distributed Data Processing
Concept Check
A set of small databases where data are collected,
processed, and stored on multiple computers within a
network is a
a. centralized database.
b. distributed database.
c. flat file database.
d. high-impact process.
SO 8 Distributed databases and advantages of the use of distributed data
Cloud-Based Databases
Providers of cloud-based database services include companies
like Amazon (Amazon Elastic Compute Cloud), Google (Google
Cloud Storage), Microsoft (Windows Azure), and IBM (IBM
A company can buy data storage from these providers.
Arrangement is Database as a Service (DaaS).
Cloud provider generally provides
data storage space and
software tools to manage and control the database.
SO 09 Cloud-based databases
Real World
The best-selling jet airplane of the Boeing Corporation is the 737. In
2011, Boeing rolled out a new function called “737 Explained,” a
cloud-based database and application using Microsoft Azure cloud
storage. This cloud database stores 20,000 high-resolution photos of
the Boeing 737, which are accessible by the Boeing salespeople who
may be traveling to any location in the world to seek customers. 737
Explained can show 360-degree tours of the airplane, as well as
individual parts and features. The director of marketing at Boeing
said, “737 Explained is one of the best marketing tools I’ve seen
because it allows us to show prospective customers the new features
and improvements without bringing them to an airport.”
SO 10 Controls for data and databases
IT Controls for Data and Databases
To ensure integrity (completeness and accuracy) of data in
the database, IT application controls should be used. These
controls are
► input,
► processing, and
► output controls such as
1. data validation,
2. control totals and reconciliation, and
3. reports that are analyzed by managers.
SO 10 Controls for data and databases
Ethical Issues Related to Data Collection
Ethical Responsibilities of the Company
Data collected and stored in databases in many instances
consist of information that is private between the company
and its customer.
Ten privacy practices for online companies:
1. Management
6. Access
2. Notice
7. Disclosure to third parties
3. Choice and consent
8. Security for privacy
4. Collection
9. Quality
5. Use and retention
10. Monitoring and enforcement
SO 11 Ethical issues related to data collection and
storage, and their use in IT systems
Real World
No matter how extensive the controls in place, it is never possible to
completely eliminate unauthorized access. In April of 2011, Netflix
disclosed that it had fired an unnamed call center employee for
stealing credit card information from customers he had spoken with
on the phone. The company declined to disclose the number of
customers affected. The “monitoring and enforcement” mention
above is intended to help discover problems such as this and to fix
them quickly. In this case, a Netflix spokesperson said, “We do
everything we can to safeguard our members’ personal data and
privacy, and when there’s an issue like this, we deal with it swiftly and
SO 11 Ethical issues related to data collection and
storage, and their use in IT systems
Ethical Issues Related to Data Collection
Ethical Responsibilities of Employees
Employees have an ethical obligation to avoid misuse of any
private or personal data about customers.
There are no specific IT controls that would always prevent
authorized employees from disclosing private information.
SO 11 Ethical issues related to data collection and
storage, and their use in IT systems
Ethical Issues Related to Data Collection
Ethical Responsibilities of Customers
Customers have an obligation to
provide accurate and complete information.
keep any known company information confidential.
avoid improper use of data that they gain from accessing
a database as a customer.
SO 11 Ethical issues related to data collection and
storage, and their use in IT systems
Near Lexington, Kentucky, the breeding and
racing of thoroughbred horses is a significant
industry. Tracking the bloodlines of the thoroughbreds used as studs in
breeding is important information to those who breed and race these horses.
During the 1970s, a company named Bloodstock began maintaining a
database of stud horse and mare bloodlines and race handicapping data.
Breeders and others could establish an account with Bloodstock and access
this computer database in choosing a stud horse to use for breeding or for
handicapping races. Eventually, this database became a Web-based
resource called BRISNET. In 1997, someone began establishing and using
fictitious customer accounts to access the BRISNET database. Over a period
of months, this person accessed and downloaded BRISNET data. He then
posted these data to his own database and Website and began selling the
data at prices below those charged by Bloodstock. Upon discovery of this
unethical act, the United States Attorney of the district, surprisingly, declined
to charge the violator with federal crimes. However, Bloodstock settled out of
court with the violator for an undisclosed dollar amount.
Real World
SO 11
Ethical Issues Related to Data Collection
Concept Check
Each of the following is an online privacy practice recommended
by the AICPA Trust Services Principles Privacy Framework
a. Redundant data should be eliminated from the database.
b. Notification of privacy policies should be given to
c. Private information should not be given to third parties
without the customer’s consent.
d. All of the above.
SO 11 Ethical issues related to data collection and
storage, and their use in IT systems
Copyright © 2013 John Wiley & Sons, Inc. All rights reserved.
Reproduction or translation of this work beyond that permitted in
Section 117 of the 1976 United States Copyright Act without the
express written permission of the copyright owner is unlawful.
Request for further information should be addressed to the
Permissions Department, John Wiley & Sons, Inc. The purchaser
may make back-up copies for his/her own use only and not for
distribution or resale. The Publisher assumes no responsibility for
errors, omissions, or damages, caused by the use of these
programs or from the use of the information contained herein.