* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lesson15 Data_Warehousing
Survey
Document related concepts
Expense and cost recovery system (ECRS) wikipedia , lookup
Operational transformation wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Clusterpoint wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Forecasting wikipedia , lookup
Relational model wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Business intelligence wikipedia , lookup
Transcript
Data Warehouses Decision Support Advanced Database Dr. AlaaEddin Almabhouh Topic & Structure of Lesson In this lecture we will be looking at: Decision making Data Warehousing Definition Applications Technologies Slide 2 (of 63) The Evolution of Data Warehousing Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient services to the customer. This resulted in accumulation of growing amounts of data in operational databases. Organizations now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage. However, operational systems were never designed to support such business activities. Organizations need to turn their archives of data into a source of knowledge, so that a single integrated / consolidated view of the organization’s data is presented to the user. A data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources. Slide 24 (of 63) Types of Data Sources Operational Databases Commercial Databases Physical Documents Internet / WWW “Desktop excel files” Any electronic document ? Slide 7 (of 63) Types of Data Operational Data (OLTP applications) Online Transaction Processing OLTP refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. Data that “works” Frequent updates and queries Normalized for efficient search and updates Point Queries: Does OLTP aid decision making ? Slide 3 (of 63) Types of Data Historical Data (OLAP applications) Online Analytical Processing OLAP is a data structure that allows fast analysis of data. It can also be defined as the capability of analyzing data from multiple perspectives. Data that “tells” Very frequent updates Integrated data set with global relevance Analytical queries that required huge amounts of data Performance issues mainly in query response time (not in updates) Slide 3 (of 63) Example of OLTP Queries What is the salary of Mr.Adam? What is the address and phone number of the person in charge of the Supplies department? How many employees working in UK branch? Slide 3 (of 63) Example of OLAP Queries How is the employee attrition scene changing over the years across the company? Is there a correlation between the geographical location of a company unit and excellent employee appraisals? Is it financially viable to continue our manufacturing unit in Malaysia? Slide 3 (of 63) What is a Data Warehouse? A central database that is loaded from multiple operational databases for the purpose of end-user access and decision support. Data warehouses offer data integration solutions and improved access to timely, accurate and consistent data. Data warehouses also help in accessing, aggregating and analyzing large amounts of data from diverse sources to understand historical performance or behavior and to predict and manage outcomes. Slide 24 (of 63) Definition Data Warehousing Bill Inmon "A subject-oriented, integrated, timevariant, non-volatile collection of data in support of management's decision -making process" Slide 25 (of 63) Definition (Cont ...) Subject-Oriented: The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales). Integrated: The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent. Slide 25 (of 63) Definition (Cont ...) Time-Variant: Data in the warehouse is only accurate and valid at some point in time or over some time interval. Non-Volatile: Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. Slide 25 (of 63) Data Warehouse vs. Operational DB Systems Operational Systems Data Warehouse Systems Used by management Used by front-line workers Strategic value Tactical value Supports strategic direction Used for on-line analysis Supports day-to-day operation Used for transaction processing Subject oriented Application oriented Stores historical data Stores current data only Slide 27 (of 63) Data Warehouse Architecture Operational /information data Sales Database Data Warehouse Costumer Database •Report/ Query tools Metadata Product Database •OLAP tools Data extraction Data tranforming Data load Data that describes d/w. (eg: •Data mining tools info about data sources, rules, audit trails Major Building Blocks (Components) of the Data Warehouse 1. 2. 3. 4. 5. 6. Source data component (operation data store) Data staging component Data storage component Information delivery component Metadata component Management and control component Definition of Data Mart (or Datamart) A database, or collection of databases, designed to help managers make strategic decisions about their business. Whereas a data warehouse combines databases across an entire enterprise, data marts are usually smaller and focus on a particular subject or department. Some data marts, called dependent data marts, are subsets of larger data warehouses. Slide 41 (of 63) Architecture See other handout Slide 43 (of 63) Q &A Slide 81 (of 82) Data Warehouse Dimensional Modeling Design Advanced Databse Topics Levels of Modeling Data Warehouse Modeling The General Approach – The Star Schema Development The Database Component of a Data Warehouse – Fact Table and Dimension Table A Case Study Databases & Modeling Type of Database Relational Database Constructs ERD Characteristics Row/ Column Data Warehouse Dimensional Modeling Cube Distributed Database Distributed Component Object Model Client Object (DCOM) Class Diagram Object Object-Oriented Database Object = Data + Operations(Services); Entity = Data only New Trend OLAP DW XML UML Data Warehouse Modeling • • • Also called Dimensional Modelling Ways to derive the database component of a data warehouse Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. Example: Data Warehouse Modeling Fact Table Hotel Hotel_No Key Hotel Desc Hotel name Room_no key Single Double Family Times Sales Hotel_No Key Guest Key Time Key YTD_Sales_dollars_by_hotel YTD_Sales_dollar_by_Type YTD_Sales_By_Business YTD_Sales_by_non-business time key day of week quarter year Guest Profile Profile key Profile desc Territory Age category Income category Demographics Demographic Key Cluster 1 Population Cluster 2 Population The General Approach Create the high level enterprise ERD Develop logical data model for subject area only Create data warehouse data model DM Develop physical data model Dimensional Modelling Modelling technique that aims to present the data in a standard, intuitive form that allows for high-performance access. Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table. Dimensional Modelling All natural keys are replaced with surrogate keys (branch Id instead of branch #). Means that every join between fact and dimension tables is based on surrogate (intelligence) keys, not natural keys. Surrogate keys allows data in the warehouse to have some independence from the data used and produced by the OLTP systems. Dimensional Modelling Bulk of data in data warehouse is in fact tables, which can be extremely large. Important to treat fact data as read-only reference data that will not change over time. Most useful fact tables contain one or more numerical measures, or ‘facts’ that occur for each record and are numeric and additive. Fact Table and Dimension Table Fact Table: A Fact Table is a table in a relational database with a multi-part key. Each element of the key is itself a foreign key to a single dimension tale. Dimension Tables They are the constraints used in forming the fact table. Inside A Dimension Table Dimension table key. Primary key uniquely identifies each row in the table. Table is wide. Typically, a dimension table has many columns or attributes. Textual attributes. Dimension tables usually contain descriptive textual information. Inside A Dimension Table (Cont’d) Drilling down, rolling up. The attributes in a dimension table provide the ability to get to the details from high levels of aggregation to lower levels of details. Multiple Hierarchies. Dimension tables often provide for multiple hierarchies, so that drilling down may be performed along any of the multiple hierarchies. Few number of record. A dimension table typically has fewer number of records or rows than the fact table. Inside The Fact Table Concatenated Key. A row in the fact table relates to a combination of rows from all the dimension tables. Fully Additive Measures. The values of the attributes can be summed up by simple additions. Semi-additive Measures. Derived attributes such as percentages are not additive. They are known as semi-additive measures. Inside The Fact Table (Cont’d) Table Deep, not Wide. Typically a fact table has fewer attributes than a dimension table. But the number of records in a fact table is very large in comparison. Sparse Data. There are rows with null measures such as the date representing a closed holiday. In this case, there is no need to keep these rows. Degenerate Dimensions. Examples of such attributes are reference numbers like order numbers, invoice numbers, order line numbers, and so on. OLAP Cubes A data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item (item_name, brand, type), or time (day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables 33 MOLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions 34 Figure: Slicing a data cube 35 Star Schema vs. Snowflake Schema Star Schema (or Star Joint Schema) “A specific organization of a database in which a fact table with a composite key is joined to a number of single-level dimension tables, each with a single, primary key” Snowflake Schema A variant of the star schema where each dimension can have its dimensions. Starflake schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) schemas. Allows dimensions to be present in both forms to cater for different query requirements. -- Kimball Ralph, Data Warehouse Toolkit --- A STAR SCHEMA for Auto Sales Product Auto Sale Time Payment method Dealer Customer Demographics Facts: Actual sale price, Options price, Full price, Dealer add-on, Dealer credit, Dealer invoice, Down payment , Proceeds, Finance vs. Dimension Tables below Time Product Payment Method Customer Demographics Dealer Year Model Name Finance Type Age Dealer name Quarter Model Year Term (months) Gender City Month Package styling Interest rate Income range State Date Product category Agent Marital status Zone Day of week Exterior color Household size Day of month Interior color Home value Season Holiday flag Own or rent A Star Join Schema For A Food Cooperative Fact Table Times Food Item Food Item Key Food Item Desc Qty Dimension tables Member Profile Profile key Profile desc Territory Age category Income category Sales Food Item Key Profile Key Time Key YTD_Sales_dollars YTD_Sales_qty Demographics Demographic Key Cluster 1 Population time key day of week quarter year Time-series Dimension table Star Schema for Property Sales Time Fact Table Time Id (PK) PropertySale TimeId key Propertyid key Branchid key Clinetid key Promotionid key Staffid key Ownerid key Day week Quarter year Branch Branchid (PK) Promotion Promotionid (PK) PropertyforSale Propertyid (PK) Client Clientid (PK) Staff Owner Ownerid (PK) Staffid (PK) Fact and Dimension Tables for each Business Process of Property Sales Business Process Fact Tables Dimension Tables Property Sales PropertySale Time, Branch Staff, PropertyForSale, owner, ClientBuyer, Promotion Property Rentals Lease Time, Branch, Staff, PropertyForRent, owner, ClientRenter, Promotion Property Viewing PropertyViewing Time,Branch, PropertyForSale PropertyForRent, ClientBuyer ClientRenter Property Advertising Advert Time,Branch, PropertyForSale PropertyForRent, Promotion, Newspaper Property Maintenance PropertyMaintenan ce Time, Branch Staff, PropertyForRent Summary In this lecture we looked at: Data Warehouse Design Slide 62 (of 63) Other Resources http://office.microsoft.com/en-us/excel-help/data-miningadd-ins-HA010342915.aspx http://www.microsoft.com/enus/download/details.aspx?id=7294 http://www.microsoft.com/enus/download/details.aspx?id=29061 Q &A Slide 81 (of 82)