* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Topics for Today (10/13/99)
Survey
Document related concepts
Entity–attribute–value model wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Clusterpoint wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Database model wikipedia , lookup
3D optical data storage wikipedia , lookup
Transcript
Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics of a data warehouse? What are the components of a data warehousing system? How is a data warehouse created? How is a data warehouse accessed? 1 TPS vs. DSS Issue TPS/MIS DSS Definition Systems to support day-today operations. Systems to support ad-hoc decision making. Users clerks, data entry, managers, analysts, low-level support staff, supervisors. researchers. Design goal Performance. Flexibility, ease of use, ease of access. Transaction Type Updates. Queries. Query Activity low; few joins. high; many joins. Transaction vs. DSS databases Issue Content Transaction database Internal data, process-oriented. DSS database Internal and external data. Subject-oriented. Data currency Real time. Batch. Current. Historical. Volatile. Non-volatile. Summary level Details of transactions; no (or very little) derived data. Summarized; many aggregation levels. Volume Megabytes to gigabytes. Gigabytes to terabytes. Design Normalized to prevent anomalies. Denormalized to enhance query performance. So, can one database support both transaction processing and decision support applications? Yes No What is a data warehouse? A data warehouse is a database designed to support a decision support system. A data warehouse is: Integrated: It is a centralized, consolidated database integrating data from an entire organization. Subject-oriented: Data warehouse data are organized around key subjects. The data are usually arranged by topic, such as customers, products, suppliers, etc. Time-variant: Data in the warehouse contain a time dimension so that they may be used as a historical aggregation. Non-volatile: Once data enter, they seldom leave. Data are appended rather than overwritten. Data are updated in batches. CustomerType PK CustomerTypeID Description Data warehouse design example SalesFact Employee PK EmployeeID Name PK PK PK PK,FK4 PK,FK1 PK,FK2 PK,FK3 Day Month Year ProductID CustomerTypeID EmployeeID LocationID SalesDollars #ofSales Product PK ProductID Description Location PK LocationID Description Issues in designing a data warehouse Must have a predefined subject focus. Has the potential to be very large – must define the “grain” or granularity level of storage. Will always have a dimension of time. Will contain derived data. Will be a summary of data, rather than each detailed transaction. Does not always adhere to standard normalization rules. 7 Customer Transaction Database Order Transaction Database Product Transaction Database Data Scrubbing Data Scrubbing Data Scrubbing Data Extraction Data Extraction Data Extraction Data Integration Creating a Data Warehouse Sales Data Warehouse Issues in creating a data warehouse How to get accurate and complete data? How to consolidate data? Differing data meanings. Differing storage mechanisms. Differing data formats. 9 Components of a data warehousing system Data store. Extraction/filtering/transformation processes. End user query tools. End user visualization tools. 10 Two-tier data warehouse architecture Data warehouse server Operational database Transformation process Operational database Summarized data External data s ource EDM Data warehous e User departments Three-tier data warehouse architecture Data warehouse server Operational database Transformation process Data mart tier Extraction process Data mart Operational database Summarized data External data s ource EDM Data warehous e Data mart User departments Accessing a data warehouse Visualization tools. Graphical. Spreadsheet format - usually Excel or Lotus look-andfeel. Dashboard. Example: http://tomcat.corda.com/superstore/sr.jsp Query tools. OLAP: Online analytical processing. Data mining: Artificial intelligence based query methods. 13 Online analytical processing Provides multi-dimensional data analysis techniques. Works primarily with data aggregation. Provides advanced statistical analysis. Provides advanced graphical output. Supports access to very large databases. Provides enhanced query optimization algorithms. Lots of acronyms: OLAP, ROLAP, MOLAP, HOLAP. Can be add-ons to existing products, example is Excel. Can have their own user interfaces. 14 OLAP vs. Data Mining questions OLAP Which customers spent the most with us in the past year? Data Mining Which types of customers are likely to spend the most with us in the coming year? How much did the bank What are the lose from loan defaulters characteristics of the within the past two years? customers most likely to default on their loans before the year is over? What were the highest selling fashion items in our London stores? What additional products are most likely to be sold to customers who buy shorts? Which store/location made the highest sales in the past year? In which area whould we open a new store next year? Data mining Data mining tools: analyze the data; uncover patterns hidden in the data; form computer models based on the findings; and use the models to predict business behavior. Proactive tools. Based on artificial intelligence software such as decision trees, neural networks, fuzzy logic systems, inductive nets and classification networking. 16 What are some applications of data warehousing? Customer relationship management. Business process management. Order management. Strategic decision analysis. 17