* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Warehousing
Expense and cost recovery system (ECRS) wikipedia , lookup
Clusterpoint wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Business intelligence wikipedia , lookup
Data Warehousing ISYS 650 What is a data warehouse? • A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management’s decision. – Subject-oriented: data is organized around major subjects of the enterprise, such as sales, rather than individual transactions, and is oriented to decision making. – Integrated: the same piece of information collected from various systems is referred to in only one way. • Example: Gender: M, F; Male, Female; Sex: 0, 1 – Nonvolatile: Data is loaded into a data warehouse on a scheduled basis. – Time-variant: Historical data to support time-series and trend analysis. What is a Data Warehouse? • A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format • “The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is nonvolatile and relevant to some moment in time” Need for Data Warehousing • Separation of operational and informational systems and data for improved performance. Types of Data in a DW • Current detailed data: consistent at the time the data is extracted from the transaction system. • Old detailed data: need to be archived. • Summarized data • Metadata: – A directory of what is in the warehouse. – A guide to mapping data from transaction database to data warehouse Data Mart A departmental data warehouse that stores only relevant data – Dependent data mart A subset that is created directly from a data warehouse – Independent data mart A small data warehouse designed for a strategic business unit or a department DW Framework No data marts option Applications (Visualization) Data Sources Access ETL Process Select Legacy Metadata Extract POS Transform Enterprise Data warehouse Integrate Other OLTP/wEB Data mart (Finance) Load Replication External data Data mart (Engineering) Data mart (...) / Middleware Data mart (Marketing) API ERP Routine Business Reporting Data/text mining OLAP, Dashboard, Web Custom built applications Data Integration and the Extraction, Transformation, and Load (ETL) Process Extraction, transformation, and load (ETL) Transient data source Packaged application Data warehouse Legacy system Extract Transform Cleanse Load Data mart Other internal applications Representation of Data in DW • Dimensional Modeling – a retrieval-based system that supports high-volume query access • Star schema – the most commonly used and the simplest style of dimensional modeling – Contain a fact table surrounded by and connected to several dimension tables – Fact table contains the descriptive attributes (numerical values) needed to perform decision analysis and query reporting – Dimension tables contain classification and aggregation information about the values in the fact table • Snowflakes schema – an extension of star schema where the diagram resembles a snowflake in shape Multidimensionality • Multidimensionality The ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions) • Multidimensional presentation – Dimensions: products, salespeople, market segments, business units, geographical locations, distribution channels, country, or industry – Measures: money, sales volume, head count, inventory profit, actual versus forecast – Time: daily, weekly, monthly, quarterly, or yearly Example: Northwind Database Examples of Sales Analysis • Total sales by Product • Sales related to Customer: – Location: Sales by City, Country • Sales related to Time: – Quarterly, monthly, yearly Sales • Sales related to Employee: Analyze Sales Data Detailed Business Data • Total sales: • Amount of each detail line: Quantity*UnitPrice*Discount • Sum (Quantity*UnitPrice*Discount) • Total quantity sold: • Sum(Quantity) • Detailed business data: – Quantity*UnitPrice*Discount – Quantity Dimensions for Data Analysis: Factors relevant to the detailed business data • Analyze sales by: – Product, product category – Location: City, State, Country – Time: • Quarterly, yearly sales – Employee: – And combinations of these dimensions: • Ex: Product and Location, Product and Time Data Warehouse Design - Star Schema • Dimension tables – contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc. • Fact table – contain detailed business data with links to dimension tables. Define Product Dimension • Product Table: – ProductID, ProductName, SupplierID, CategoryID, QuantityPerUnit, UnitPrice, UnitsInStock, UnitsOnOrder, ReorderLevel, Discontinued • Product dimension table: – ProductID, ProductName, CategoryID Define Employee Dimenstion • Employees Table: EmployeeID, LastName, FirstName, Title, TitleOfCourtesy, BirthDate, HireDate, Address, City, Region, PostalCode, Country, HomePhone, etc. • Employee Dimension: – EmployeeID, FullName, Title, EmpCity Define Location Dimension • Customers table: – CustomerID, CompanyName, ContactName, ContactTitle, Address, City, Region, PostalCode, Country, Phone, Fax • Location dimension: – LocationCode, City, Country – Define Location Code: This is an artificial code created to link detailed business data with the city and country. – In the Northwind database, I used the Make Table query to create a Location table from the Customers table with City and Country fields. Then I used the Customers table’s design view to add a LocationCode field with the Auto Number data type. Define Period Dimension • Period: – In the Orders table: OrderDate – In the data warehouse we define Period to be: PeriodCode, Year, Quarter • OrderDate: 04-Jul-1996 -> 1996, 3, 7 • OrderDate: 20-Dec-1996 -> 1996, 4, 12 – In Access: Create view based on Orders table • Year:Year(OrderDate); Month:Month(OrderDate) • Quarter: – Quarter: IIf([month]<=3,1,IIf([month]<=6,2,IIf([month]<=9,3,4))) – Define Period Code: • PeriodCode:Cstr(Year) + Cstr(Quarter) • 1996, 3, 7 -> 19963 • 1996, 4, 12 -> 19964 Star Schema Location Dimension LocationCode City Country FactTable LocationCode PeriodCode EmployeeID ProductID Qty Amount Product Dimension ProductID ProductName CategoryID Employee Dimension EmployeeID FullName Title EmpCity Period Dimension PeriodCode Year Quarter A Query to retrieve data for Fact Table Transfer Data Between Access Databases • Create/Query/Design View – 1. Create the query with the data to transfer – 2. Click Make Table button • Make table in the same database • Make table in other database – 3. Click Run