Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
SECOND EDITION Chapter 13-1 Prepared by Coby Harmon Prepared by Coby Harmon Universityof ofCalifornia, California, Santa Santa Barbara University Barbara Westmont College College Westmont Data and Databases Chapter 13-2 Study Objectives 1. The need for data collection and storage 2. Methods of storing data and the interrelationship between storage and processing 3. The differences between batch processing and real-time processing 4. The importance of databases and the historical progression from flat-file databases to relational databases 5. The need for normalization of data in a relational database 6. Data warehouse and the use of a data warehouse to analyze data 7. The use of OLAP and data mining as analysis tools 8. Distributed databases and advantages of the use of distributed data 9. Controls for data and databases 10. Ethical issues related to data collection and storage, and their use in IT systems Chapter 13-3 Real World Think about the volume of sales transactions that occur on the Websites of large Internet retailers such as L.L. Bean, Lands’ End, and J.Crew. These companies each process an average of approximately 120,000 transactions each day on their Websites. For each of these transactions, important data must be collected about the customer, location, payment, and the items sold. Even more overwhelming is the volume of sales transactions that are processed by Wal-Mart on any given day. In addition to its Web-based sales, consider Wal-Mart’s thousands of retail centers with several check-out lines at each location and long hours of operation. Think about the number of accountants and computers that might be required to manage all of the related records. It is no wonder that Wal-Mart has one of the largest databases of any business organization in the world. The Wal-Mart database continually grows with new transactions. Some estimate that Wal-Mart adds 1 billion rows of data per day. In addition to the size of the database, it is also growing faster. The company attaches RFID chips to merchandise so that inventory purchases, movement to stores, and sales are tracked in real time. Since the data for these events get added to the database so quickly, the database grows and becomes more useful for immediate analysis. This allows Wal-Mart to more quickly analyze and forecast inventory needs. Chapter 13-4 The Need for Data Collection and Storage Data are the set of facts collected from transactions, whereas information is the interpretation of data that have been processed. Main reasons to store transaction data: 1. To complete transactions from beginning to end. 2. To follow up with customers or vendors and to expedite future transactions. 3. To create accounting reports and financial statements. 4. To provide feedback to management. Chapter 13-5 SO 1 The need for data collection and storage The Need for Data Collection and Storage Typical storage and processing techniques: 1. The storage media types for data: sequential and random access 2. Methods of processing data: batch and real time 3. Databases and relational databases 4. Data warehouses, data mining, and OLAP 5. Distributed data processing and distributed databases Chapter 13-6 SO 1 The need for data collection and storage The Need for Data Collection and Storage Concept Check Which of the following best describes the relationship between data and information? a. Data are interpreted information. b. Information is interpreted data. c. Data are more useful than information in decision making. d. Data and information are not related. Chapter 13-7 SO 1 The need for data collection and storage Storing and Accessing Data Data Storage Terminology Chapter 13-8 ► Character ► Record ► Field ► File ► Database Exhibit 13-1 Data Hierarchy SO 2 Methods of storing data and the interrelationship between storage and processing Storing and Accessing Data Data Storage Media ► Magnetic tape ► Sequential access ► Random Access Chapter 13-9 Early Days of Mainframe Computers Modern IT Systems SO 2 Methods of storing data and the interrelationship between storage and processing Storing and Accessing Data Concept Check A character is to a field as a. water is to a pool. b. a pool is to a swimmer. c. a pool is to water. d. a glass is to water. Chapter 13-10 SO 2 Methods of storing data and the interrelationship between storage and processing Storing and Accessing Data Concept Check Magnetic tape is a form of a. direct access media. b. random access media. c. sequential access media. d. alphabetical access media. Chapter 13-11 SO 2 Methods of storing data and the interrelationship between storage and processing Data Processing Techniques Batch Processing Real-time Processing Exhibit 13-2 Comparison of Batch and Real-Time Processing Chapter 13-12 SO 3 The differences between batch processing and real-time processing Data Processing Techniques Concept Check Which of the following is not an advantage of using real-time data processing? a. Quick response time to support timely record keeping and customer satisfaction b. Efficiency for use with large volumes of data c. Provides for random access of data d. Improved accuracy due to the immediate recording of transactions Chapter 13-13 SO 3 The differences between batch processing and real-time processing Databases Data stored in a form that allows the data to be easily accessed, retrieved, manipulated, and stored. Exhibit 13-3 Traditional FileOriented Approach Data redundancy Concurrency Chapter 13-14 SO 4 The importance of databases and the historical progression from flat-file databases to relational databases Databases Exhibit 13-3 Database Approach Relationships One-to-One Database Management System (DBMS) is software that manages the database and controls the access and use of data by individual users and applications. Chapter 13-15 One-to-Many Many-to-Many SO 4 The importance of databases and the historical progression from flat-file databases to relational databases The History of Databases Exhibit 13-4 Database Table Flat File Database Model ► 1950s and 1960s ► Large volumes of similar transactions ► Single record not easily retrieved or stored ► Text format, sequential order ► Sequential processing Chapter 13-16 SO 4 The History of Databases Hierarchical Database Model ► Inverted tree structure ► Parent–child, represent one-to-many relationships ► Record pointer Chapter 13-17 Exhibit 13-5 Linkages in a Hierarchical Database SO 4 The History of Databases Network Database Model ► Inverted tree structure ► More complex relationship linkages by use of shared branches ► Not very popular, rarely used Chapter 13-18 SO 4 The importance of databases and the historical progression from flat-file databases to relational databases The History of Databases Relational Database Model ► Developed in 1969 ► Stores data in two-dimensional tables ► Most widely used database structure today ► Examples include; IBM DB2, Oracle Database, and Microsoft Access ® Chapter 13-19 SO 4 The importance of databases and the historical progression from flat-file databases to relational databases Databases Concept Check If a company stores data in separate files in its different departmental locations and is able to update all files simultaneously, it would not have problems with a. attributes. b. data redundancy. c. industrial espionage. d. concurrency. Chapter 13-20 SO 4 The importance of databases and the historical progression from flat-file databases to relational databases Databases Concept Check When the data contained in a database are stored in large, two-dimensional tables, the database is referred to as a a. flat file database. b. hierarchical database. c. network database. d. relational database. Chapter 13-21 SO 4 The importance of databases and the historical progression from flat-file databases to relational databases Databases Concept Check Database management systems are categorized by the data structures they support. In which type of database management system is the data arranged in a series of tables? a. Network b. Hierarchical c. Relational d. Sequential Chapter 13-22 SO 4 The importance of databases and the historical progression from flat-file databases to relational databases The Need for Normalized Data Relational databases consist of several small tables. Small tables can be joined in ways that represent relationships among the data. Exhibit 13-6 Relational Database in Microsoft Access Bolded field is the primary key. Chapter 13-23 SO 5 The need for normalization of data in a relational database The Need for Normalized Data Relational database has flexibility in retrieving data. Structured query language (SQL) has become the industry standard. SELECT Customers.CustomerID, Customers.CompanyName, Orders.OrderID, Orders.ShippedDate FROM Customers INNER JOIN Orders ON Customers.CustomerID Orders.CustomerID; Chapter 13-24 Exhibit 13-7 Relational Database in Microsoft Access SO 5 The Need for Normalized Data The process of converting data into tables that meet the definition of a relational database is called data normalization. ► Seven rules of data normalization, additive. ► Most relational databases are in third normal form. ► First three rules of data normalization are: 1. Eliminate repeating groups 2. Eliminate redundant data 3. Eliminate columns not dependent on primary key. Chapter 13-25 SO 5 The need for normalization of data in a relational database The Need for Normalized Data Trade-offs in Database Storage Relational database ► Not most efficient way to store data that will be used in other ways. ► Most organizations are willing to accept less transaction processing efficiency for better query opportunities. Chapter 13-26 SO 5 The need for normalization of data in a relational database The Need for Normalized Data Concept Check Which of the following statements is not true with regard to a relational database? a. It is flexible and useful for unplanned, ad hoc queries. b. It stores data in tables. c. It stores data in a tree formation. d. It is maintained on direct access devices. Chapter 13-27 SO 5 The need for normalization of data in a relational database Use of a Data Warehouse to Analyze Data Management often needs data from several fiscal periods from across the whole organization. Exhibit 13-8 The Data Warehouse and Operational Databases Chapter 13-28 SO 6 Data warehouse and the use of a data warehouse to analyze data Use of a Data Warehouse to Analyze Data Management often needs data from several fiscal periods from across the whole organization. ► Build the data warehouse ► Identify the data ► Standardize the data ► Cleanse, or scrub, the data ► Upload the data Chapter 13-29 SO 6 Data warehouse and the use of a data warehouse to analyze data Use of a Data Warehouse to Analyze Data Concept Check A collection of several years’ nonvolatile data used to support strategic decision-making is a(n) a. operational database. b. data warehouse. c. data mine. d. what-if simulation. Chapter 13-30 SO 6 Data warehouse and the use of a data warehouse to analyze data Data Analysis Tools Data mining is the process of searching for identifiable patterns in data that can be used to predict future behavior. Online Analytical Processing (OLAP) is a set of software tools that allow online analysis of the data within a data warehouse. Analytical methods in OLAP usually include: Chapter 13-31 1. Drill down 4. Time series analysis 2. Consolidation 5. Exception reports 3. Pivoting 6. What-if simulations SO 7 The use of OLAP and data mining as analysis tools Data Analysis Tools Concept Check Data mining would be useful in all of the following situations except a. identifying hidden patterns in customers’ buying habits. b. assessing customer reactions to new products. c. determining customers’ behavior patterns. d. accessing customers’ payment histories. Chapter 13-32 SO 7 The use of OLAP and data mining as analysis tools Distributed Data Processing Early days Centralized processing Centralized databases Today’s IT Environment Distributed data processing (DDP) Distributed databases (DDB) Chapter 13-33 SO 8 Distributed databases and advantages of the use of distributed data McDonald’s has restaurants, warehouses, and offices located throughout the world; yet its corporate headquarters is in Oakbrook, Illinois. If McDonald’s management decided that all data, including prices, must be stored in a database at corporate headquarters, what would have to happen when you order a cheeseburger at a McDonald’s in Los Angeles? The cash register system would have to read pricing data from the database in Oakbrook, Illinois. This would be inefficient for several reasons. First, each McDonald’s restaurant would be trying to read the same database simultaneously in order to fill customer orders all around the world. Each of the McDonald’s restaurants would need to be networked to that data in Illinois and would need to be able to read price data quickly in order to process the sale. This would generate so much network traffic that it would very likely overwhelm the network and computer system. In addition, if prices are stored only at corporate headquarters, it would become more difficult for each location to set its own prices. Certainly, it would be much more efficient for McDonald’s to maintain pricing data at the local restaurants or in regional centers. Real World Chapter 13-34 SO 8 Distributed Data Processing Distributing the processing and data offers the following advantages: 1. Reduced hardware cost 2. Improved responsiveness 3. Easier incremental growth 4. Increased user control and user involvement 5. Automatic integrated backup The most popular type of distributed system is a client/server system. Chapter 13-35 SO 8 Distributed databases and advantages of the use of distributed data Distributed Data Processing Concept Check A set of small databases where data are collected, processed, and stored on multiple computers within a network is a a. centralized database. b. distributed database. c. flat file database. d. high-impact process. Chapter 13-36 SO 8 Distributed databases and advantages of the use of distributed data Cloud-Based Databases Providers of cloud-based database services include companies like Amazon (Amazon Elastic Compute Cloud), Google (Google Cloud Storage), Microsoft (Windows Azure), and IBM (IBM Smart-Cloud). Chapter 13-37 A company can buy data storage from these providers. Arrangement is Database as a Service (DaaS). Cloud provider generally provides ► data storage space and ► software tools to manage and control the database. SO 09 Cloud-based databases Real World The best-selling jet airplane of the Boeing Corporation is the 737. In 2011, Boeing rolled out a new function called “737 Explained,” a cloud-based database and application using Microsoft Azure cloud storage. This cloud database stores 20,000 high-resolution photos of the Boeing 737, which are accessible by the Boeing salespeople who may be traveling to any location in the world to seek customers. 737 Explained can show 360-degree tours of the airplane, as well as individual parts and features. The director of marketing at Boeing said, “737 Explained is one of the best marketing tools I’ve seen because it allows us to show prospective customers the new features and improvements without bringing them to an airport.” Chapter 13-38 SO 10 Controls for data and databases IT Controls for Data and Databases To ensure integrity (completeness and accuracy) of data in the database, IT application controls should be used. These controls are ► input, ► processing, and ► output controls such as 1. data validation, 2. control totals and reconciliation, and 3. reports that are analyzed by managers. Chapter 13-39 SO 10 Controls for data and databases Ethical Issues Related to Data Collection Ethical Responsibilities of the Company Data collected and stored in databases in many instances consist of information that is private between the company and its customer. Ten privacy practices for online companies: Chapter 13-40 1. Management 6. Access 2. Notice 7. Disclosure to third parties 3. Choice and consent 8. Security for privacy 4. Collection 9. Quality 5. Use and retention 10. Monitoring and enforcement SO 11 Ethical issues related to data collection and storage, and their use in IT systems Real World No matter how extensive the controls in place, it is never possible to completely eliminate unauthorized access. In April of 2011, Netflix disclosed that it had fired an unnamed call center employee for stealing credit card information from customers he had spoken with on the phone. The company declined to disclose the number of customers affected. The “monitoring and enforcement” mention above is intended to help discover problems such as this and to fix them quickly. In this case, a Netflix spokesperson said, “We do everything we can to safeguard our members’ personal data and privacy, and when there’s an issue like this, we deal with it swiftly and decisively.” Chapter 13-41 SO 11 Ethical issues related to data collection and storage, and their use in IT systems Ethical Issues Related to Data Collection Ethical Responsibilities of Employees Employees have an ethical obligation to avoid misuse of any private or personal data about customers. There are no specific IT controls that would always prevent authorized employees from disclosing private information. Chapter 13-42 SO 11 Ethical issues related to data collection and storage, and their use in IT systems Ethical Issues Related to Data Collection Ethical Responsibilities of Customers Customers have an obligation to Chapter 13-43 ► provide accurate and complete information. ► keep any known company information confidential. ► avoid improper use of data that they gain from accessing a database as a customer. SO 11 Ethical issues related to data collection and storage, and their use in IT systems Near Lexington, Kentucky, the breeding and racing of thoroughbred horses is a significant industry. Tracking the bloodlines of the thoroughbreds used as studs in breeding is important information to those who breed and race these horses. During the 1970s, a company named Bloodstock began maintaining a database of stud horse and mare bloodlines and race handicapping data. Breeders and others could establish an account with Bloodstock and access this computer database in choosing a stud horse to use for breeding or for handicapping races. Eventually, this database became a Web-based resource called BRISNET. In 1997, someone began establishing and using fictitious customer accounts to access the BRISNET database. Over a period of months, this person accessed and downloaded BRISNET data. He then posted these data to his own database and Website and began selling the data at prices below those charged by Bloodstock. Upon discovery of this unethical act, the United States Attorney of the district, surprisingly, declined to charge the violator with federal crimes. However, Bloodstock settled out of court with the violator for an undisclosed dollar amount. Real World Chapter 13-44 SO 11 Ethical Issues Related to Data Collection Concept Check Each of the following is an online privacy practice recommended by the AICPA Trust Services Principles Privacy Framework except: a. Redundant data should be eliminated from the database. b. Notification of privacy policies should be given to customers. c. Private information should not be given to third parties without the customer’s consent. d. All of the above. Chapter 13-45 SO 11 Ethical issues related to data collection and storage, and their use in IT systems Copyright Copyright © 2013 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these programs or from the use of the information contained herein. Chapter 13-46