Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Warehousing and DSS at Group Health Cooperative 2-3 million data records are processed monthly How to use for decision support? How to hold down costs? How to improve customer service? How to utilize resource effectively? How to improve service quality? Answers Develop a comprehensive database (data warehouse) and DSS approach Very effective Data Warehousing, Access, Analysis and Visualization What to do with all the data that organizations collect, store and use? Information overload! Solution Data warehousing Data access Data mining Online analytical processing (OLAP) Data visualization Data sources Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ 4.3 The Nature and Sources of Data Data: Raw Information: Data organized to convey meaning Knowledge: Data items organized and processed to convey understanding, experience, accumulated learning, and expertise DSS Data Items – – – – – – Documents Pictures Maps Sound Animation Video Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ Data Sources Internal External Personal Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ The Internet and Commercial Database Services For External Data The Internet: Major supplier of external data Commercial Data “Banks”: Sell access to specialized databases Can add external data to the MSS in a timely manner and at a reasonable cost DSS In Focus 4.1: Sources of External Data--A Sampler M any sources of public data exist, some of which are available on the I nternet. Common data sources are: Federal Publications Survey of Current Business (Department of Commerce) (continues Business Conditions Digest in short form)--monthly, general business conditions Monthly Labor Review (Department of Labor)--monthly employment statistics (a journal with articles) Employment and Earnings (Department of Labor)--monthly, more detailed than M onthly Labor Review Other I nternational Monetary Fund--report of balance of payments, including currency rates, for participating countries Moody's--a series of manuals including abstracted information and balance sheets of most large U.S. corporations, intended for investors Standard & Poor's--periodically updated report of financial stability of most U.S. corporations Advertising Age--marketing newspaper, with a great deal of data on marketing Annual Editor & Publisher Market Guide--annual report of marketing information by SM SA (standard metropolitan statistical area) Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ TABLE 4.2 Representative Commercial Database (Data Bank) Services. CompuServe and The Source. Personal computer networks providing statistical data banks (business and financial market statistics) as well as bibliographic data banks (news, reference, library, and electronic encyclopedias). CompuServe is the largest supplier of such services to personal computer users. Compustat. Provides financial statistics about more than 12,000 corporations. Data Resources, Inc. offers statistical data banks in agriculture, banking, commodities, demographics, economics, energy, finance, insurance, international business, and the steel and transportation industries. DRI economists maintain a number of these data banks. Standard & Poor's is also a source. It offers services under the U.S. Central Data Bank. Dow Jones Information Service. Provides statistical data banks on stock market and other financial markets and activities, and in-depth financial statistics on all corporations listed on the New York and American stock exchanges, plus 800 other selected companies. Its Dow Jones News/Retrieval system provides bibliographic data banks on business, financial, and general news from The Wall Street Journal, Barron's, the Dow Jones News Service, Wall Street Week, and the 21-volume American Academic Encyclopedia. Interactive Data Corporation. A statistical data bank distributor covering agriculture, automobiles, banking, commodities, demographics, economics, energy, finance, international business, and insurance. Its main suppliers are Chase Econometric Associates, Standard & Poor's, and Value Line. Lockheed Information Systems. The largest bibliographic distributor. Its DIALOG system offers extracts and summaries of more than 150 different data banks in agriculture, business, economics, education, energy, engineering, environment, foundations, general new publications, government, international business, patents, pharmaceuticals, science, and social sciences. It relies on many economic research firms, trade associations, and governmental groups for data. Mead Data Central. This data bank service offers two major bibliographic data banks. Lexis provides legal research information and legal articles. Nexis provides a full-text (not abstract) bibliographic database of over 100 newspapers, magazines, and newsletters, news services, government documents, and so on. It includes full text and abstracts from the New York Times and the complete 29-volume Encyclopedia Britannica. Also provided is the Advertising & Marketing Intelligence (AMI) data bank, and the National Automated Accounting Research System. Source: Based on Standard & Poor's Compustat Services, Inc., statistics on 6,000 companies’ financial reports. The Internet/Web and Corporate Databases and Systems Use Web Browsers to Access vital information by employees and customers Implement executive information systems Implement group support systems (GSS) Database management systems provide data in HTML Web-browsers as DBMS front-ends Database Management Systems in DSS DBMS: Software program for entering (or adding) information into a database; updating, deleting, manipulating, storing, and retrieving information A DBMS combined with a modeling language is a typical system development pair, used in constructing DSS or MSS DBMS are designed to handle large amounts of information Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ Database Organization and Structure Relational Databases Hierarchical Databases Network Databases Object-oriented Databases Multimedia-based Databases Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ Data and Applications Application 1 OS Application 2 Application 3 Application 1 OS DBMS Application 2 Application 3 Application 1 OS O-O DBMS Application 2 Application 3 Traditional File Systems Advantages – simple data design to support single or small group of applications – fast data access – inexpensive Disadvantages – – – – lack of data relation redundancy lack of standards low application development productivity DBMS Advantages – – – – integration, sharing of data increased data accessibility minimized redundancy easier application development and maintenance – improved data security – logical/physical data independence Disadvantage – complex data design – slow access – expensive External view Users view of data Logical, integrated view Conceptual View • Data Definition Language • Data Manipulation Language • Query Language Internal view Physical View Physical storage structure of data 3 level DB Architecture Relational Database Invoice#, Inv.Date Customer#, Cname, Caddress Item, Item-Type, Item-Color, Item-Price, Quantity Customer Item Cust# CName CAddress 12398 Sid B. Ontario St Invoice Invoice# Cust# Inv.Date Item# Type Color Line Item Invoice# Item# Qty Price Relational Database Primary Key – duplicate rows not allowed – cannot have missing (NULL) value(s) for PK Foreign Key – defines relationship between tables – FK values either reference existing PK values or they are NULL (referential integrity) FK Customer Cust# CName CAddress 12398 Sid B. Ontario St Invoice Invoice# Cust# Inv.Date Relational Database Relational Operators – Select: subset of rows – Project: subset of columns – Join: creates new table by linking on common attributes Select Items with Price > $100 Show Item#, Type, and Price Item# Type Color Price I34 I459 I678 abc prq xyz Blue Red Purple $123.50 $57.23 $109.45 All Invoices for Customer Sid B. Customer Cust# CName 12398 Sid B. Invoice Invoice# Cust# Inv.Date CAddress Ontario St JOIN on Cust# 78912 79710 79923 12398 6/28/97 36743 8/15/97 12398 7/3/98 Data Warehousing Physical separation of operational and decision support environments Purpose: to establish a data repository making operational data accessible Uses TPS data needed for decision support Data are transformed and integrated into a consistent structure Data warehousing (or information warehousing): a solution to the data access problem End users perform ad hoc query, reporting analysis and visualization Data Warehouse: A Decision Support Focus DW technology A set of methods, techniques and tools that may be leveraged together to produce a vehicle that delivers data to end users on an integrated platform A framework to support the merging of operational data, informational data, external data, and personal data Issue is one of applying the technology to solve a business problem What is a data warehouse? Databases that support decision making and that are • subject oriented • time-variant • integrated • non volatile – organized around the essential business entities (customer, product, policy, claim, order, etc.) – contains data that has been cleansed, transformed, integrated – data organized by various time periods; often summarized on time; data is time-stamped – not updated in real time; not updated by users Order processing •2 second response time Daily c •Last 6 months orders Data Warehouse losed o r ders •Last 5 years data ry Product Price/inventory •10 second response time kly Wee •Last 10 price changes prod to nven I / e c ri uct p •Response time 2 seconds to 60 minutes •Data is not modified •Last 20 inventory transactions s ram g pro g tin e Marketing •30 second response time ekl e W ark m y •Last 2 years programs •Different performance requirements •Combine data from multiple applications •Data is mostly non-volatile •Data saved for a long time period Figure 3. Reasons for moving data outside the operations systems Order processing Customer orders Product price Available Inventory Data Warehouse Customers Product Price/inventory Product price Product Inventory Products Orders Product Inventory Product Price changes Product Price Marketing Customer Profile Product price Marketing programs •No data model restrictions of the source application •Data warehouse model has business entities Figure 5. Data warehouse entities align with the business structure Order Processing System Order Data Warehouse Orders (Closed) Inventory snapshot 1 Up Inventory snapshot 2 Inventory • Operational state information is not carried to the data warehouse • Data is transferred to the data warehouse after all state changes • Or, data is transferred with period snapshots Figure 6. Transformation of the operational state information Transformation Operational System A ----------------------cust, cust_id, borrower >> customer ID ----------------------“1” >> “M” “2” >> “F” Operational System B ----------------------Missing >>> “……..” Data Warehouse System Summarized Data Detailed Data •Uniform business terms •Single physical definition of an attribute •Consistent use of entity attributes •Default and missing values Figure 8. Physical transformation of application data Data Warehouse System Predefined reports and queries Summarized Data Queries against summary data Detailed Data Data mining in detail data Other Applications •Standard reports and queries •Queries against summary tables •Data mining in the detail data •Interface with other applications Figure 10. Data warehouse analysis processes What is a data warehouse? Kinds of information in data warehouse • old detail data • current detail data • lightly summarized data • highly summarized data • meta-data Meta Data “data on data” source, history, and many other aspects of data. Business meta data definitions,descriptions and rules used for reporting. Technical meta data structures and mapping rules for the data extraction and staging process. Allows information stored in warehouse to be used effectively for reporting and analysis, and ensure that all users have “one version of the truth”. Issue Operational Warehouse How built One application at a time One or more subject areas at a time Requirements Known Vague Critical to Daily business operation Management decisions Data access Smaller number of rows retrieved in a single transaction Tuned for frequent access to small amounts of data Volume needed for daily operation Large sets of data scanned to retrieve results High performance Flexibility Tuning Data volume Data retention Data currency Data availability Unit of work Design priority Tuned for infrequent access to large amounts of data Larger volume needed to support statistical analysis, forecasting, ad hoc reporting, querying To meet daily requirement Retained longer to support historical reporting, comparison, etc. Must be up to the minute Represents a static point in time; important that data does not change in real time High availability may be Same level of availability usually not needed needed. Small, manageable, Large, unpredictable, highly variable predictable unit of work unit of work Data Warehousing Benefits Increase in knowledge worker productivity Supports all decision makers’ data requirements Provide ready access to critical data Insulates operation databases from ad hoc processing Provides high-level summary information Provides drill down capabilities Yields – – – – – Improved business knowledge Competitive advantage Enhances customer service and satisfaction Facilitates decision making Help streamline business processes DW Suitability For organizations where Data are in different systems Information-based approach to management in use Large, diverse customer base Same data have different representations in different systems Highly technical, messy data formats Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ PC Files LAN Servers Mainframe OLTP Databases External Sources Data Loader Data Converter Data Scrubber Data Transformer Data Warehouse OLAP Server OLAP Interface Data Marts Data warehouse designed to meet the needs of a specific group of users Should (but may not) be designed with corporate standards and accessibility in mind – incorporate standards for hardware, software, networking, DBMS, naming conventions, etc. – vendor’s attempt to bypass IT and sell directly to end-users? Operational Data Store Used for operational processing, may be used to feed the DW An architectural construct that is • subject-oriented • integrated • volatile • current valued • comprised of only corporate detailed data Multiple applications may use the data, with updating in one place Effective in organizations trying to move legacy systems to integrated environment OLAP: Data Access and Mining, Querying and Analysis Online Analytical processing (OLAP) – DSS and EIS computing done by end-users in online systems – Versus online transaction processing (OLTP) OLAP Activities – Generating queries – Requesting ad hoc reports – Conducting statistical analyses OLAP (On-Line Analytical Processing) To gain insight into data through fast, interactive access to a wide variety of possible views of information that has been transformed from raw data view and analyze data across multiple dimensions allows flexible and easy “slicing and dicing” of data, drill down capabilities move from a general view to one which is more detailed (known as "drill-down"), or from a very detailed level to one which is more aggregated (“roll-up”). view data from a different perspective by introducing a completely different analysis criterion ("dicing" or changing view). OLAP Multidimensional OLAP vs. Relational OLAP – MOLAP: data stored in multi-dimensional arrays; use of sparse matrix techniques – ROLAP: data stored in relational DBMS; use of star-schema design ROLAP: Star Schema Design Dimension Table 1 Dimension Key 1 Description 1 Aggregation Lvl 1.1 Aggregation Lvl 1.2 Aggregation Lvl 1.3 Fact Table Dimension Key 1 Dimension Key 2 Dimension Key 3 ……. Fact 1 Fact 2 Fact 3 ……. Dimension Table 2 Dimension Key 2 Description 2 Aggregation Lvl 2.1 Aggregation Lvl 2.2 Aggregation Lvl 2.3 Dimension Table 3 Dimension Key 3 Description 3 Aggregation Lvl 3.1 Aggregation Lvl 3.2 Star-Schema example Sales Rep ID Sales Rep Name Store ID Store Name Store Location Distribution Channel Fact Table Sales Rep ID Product Code Cust Zip Code Customer Type Sales Period Date Total Qty Total $ Quota Qty Returned Qty Promotion Qty Product Code Product Name Category Product Type measures ZIP Code City State/Province Country Customer Type Cust Type Desc Cust Category Cust Category Desc Dimension Tables Dimension Tables Multi-dimensional data OLTP vs OLAP schema Company Ship-from PO Ship-to PO-item • Efficient create, update and processing of orders Item Company • Query: list purchases by companies, cost of items, source and destination Ship-from Purchases Ship-to Item Date Dimensional hierarchies Company #Units $-value Name City State Region Purchases Date Item Name Item-Type Day Week Month Qtr Product Category Year Query examples: •Purchases by Item •Purchases by Item and Date •Purchases by Item, Date and Company •Purchases by Item by Week •Purchases by Item-Type by Qtr •Purchases by Item-Type by State by Year Further OLAP queries Compare average purchase-$ in 1998 to that in 1997 Compare average monthly purchase-$ by Region What are the total 1st-quarter purchase-units by company over the past 5 years What is the deviation in weekly purchase-$ by company over the past year Model weekly purchase-units by Item-Category over the past 5 years Analytical queries - requirements beyond traditional SQL-type querying constructs OLAP uses the data warehouse and a set of tools, usually with multidimensional capabilities Query tools Spreadsheets Data mining tools Data visualization tools Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ D a ta S o u rc e s Bu sin ess Com m u n ica t ion Qu er yin g In t er n a l Da t a Sou r ces E xt er n a l Da t a Sou r ces Da t a Acqu isit ion , E xt r a ct ion , Deliver y Tr a n sfor m a t ion Repor t Gen er a t ion Da t a Wa r eh ou se Spr ea dsh eet F or eca st in g An a lysis Modelin g Da t a P r esen t a t ion and Visu a liza t ion Mu lt im edia E IS, Ot h er s On lin e An a lyt ica l P r ocessin g FIGURE 4.1 Data Warehousing and Online Analytical Processing (OLAP). Decision Support Systems and Intelligent Systems, Efraim Turban and Jay E. Aronson Copyright 1998, Prentice Hall, Upper Saddle River, NJ