Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIS4435 – Data Warehousing Lecture 9 Dr. Nawaz Khan E-mail: [email protected] 1 Data Warehousing: Reading Assignment Reading Suggestion: Lecture 9 Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison Wesley, 4th Edition, ISBN: 0321210255(chapters 31-33) Global campus materials on OASIS: http://oasis.mdx.ac.uk/ (unit 9) More Reading: Fundamentals of Database Systems. R. Elmasri and S. B. Navathe, 4th Edition, 2004, Addison-Wesley, ISBN 0-321-12226-7: Chapter 28 Data Warehousing, Data Mining, and OLAP, Alex Berson and Stephen J. Smith, McGraw-Hill, 1997, ISBN 0-07-006272-2: Chapters 6, 7 Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 2 Data Warehousing Outline Lecture 9 Definition Compare with operational systems Architecture Design issues - star schema Relation with DM Summary Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 3 Data Warehousing Definition What is a data warehouse? DW is an environment + facilities Lecture 9 Bring scattered data plant1 plant2 …... planti query data …... plantn Finished product warehouse query/ delivery Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 4 Data Warehousing Definition What is a data warehouse? Financial Department Human Resource Department …... R&D Department Lecture 9 Operational systems Data transformation DW Access tool users Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 5 Data Warehousing Compare with operational systems Operational DB systems Lecture 9 focus on day-to-day business - data structured around events run in OLTP environment support large number of transactions require quick respond - small, focused DB DW systems focus on business needs and requirements - data organised around trends and patterns in events run in off-line environment support complex queries, ad hoc and static reports - based on historical data Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 6 Data Warehousing Compare with operational systems system feature Lecture 9 Size Content Operational DW Small Large - history of business Small work areas Performance Speed - essential Tools Cross-functional subjects Better information Restrict Various flexible standard reporting tools transform/present data as intelligence Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 7 Data Warehousing Compare with operational systems Separation of an Operational and DW System Lecture 9 Minimises impact of reporting and complex query processing on operational systems Preserves operational data for re-use Manages data based on time, historical data available to users Provides a data store that can be modified to conform to the way the user views the data Unifies data, one version Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 8 Data Warehousing Compare with operational systems The need for DW Lecture 9 Consistent and quality data Cost reduction More timely data access Improved performance and productivity Two distinct types of reporting still require Operational systems derive notification style reports DW systems generate general information reports Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 9 Data Warehousing Architecture Overall Architecture Lecture 9 Operational data and processing is separated from data warehouse processing DW is a central information repository surrounded by a number of components - environment Data Mart Data Transformation Data Warehouse Metadata Access Tools Information Delivery System Operational Data Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 10 Data Warehousing Architecture The DW DW database is a cornerstone of the environment It is implemented on RDBMS technology It should support large size, ad hoc query, user view Lecture 9 Data Transformation Significant effort on extracting data from operational system and putting it in a suitable format into DW system Functionality Removing unwanted data from operational databases Converting to common data names and definitions Calculating summaries and derived data Establishing defaults for missing data Accommodating source data definition changes Difficulties Database heterogeneity Data heterogeneity Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 11 12 Rules for Data Warehouse Lecture 9 Data warehouse and operational environments are separated Data are integrated Contains historical data Represent snapshot data at a given point in time Data are subject oriented Data are read-only Data warehouse life cycle is data driven Contain summarised data Read-only transactions involve Involves data transformation Meta data component is very critical Ensure optimum use of data by end users. Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 12 Data Warehousing Architecture Meta Data - data that describes the data warehouse Lecture 9 Description of the data model Description of the database design Definition of the system managing the data items A map of the data location in the DW, including its origin, how it is transformed/aggregated, where it went Specific database design definitions Data element definitions, including rules for derivations and summaries Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 13 Data Warehousing Architecture Meta Data (cont.) Lecture 9 Information Directory - metadata that helps users to interactively access to DW and understand content, find data A gateway to the DW environment Support easy distribution and replication of its content Searchable by business-oriented key words Act as launch platform for user data access and analysis tools Support information sharing Support a variety of scheduling options Support distribution of query results Provide interface to other applications Support end-user monitoring of the status of the DW environment Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 14 Data Warehousing Architecture Access Tools Query and reporting tools Lecture 9 Managed query tool Reporting tool Production reporting tool Desktop report writer Production reporting tool - Generate regular operational reports or support high volume batch jobs Report writer - designed for end users Managed query tool - a meta-layer between end-user and database, provides point-click creation of SQL, formats the query results into easy-to-read reports, or on-screen presentation Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 15 Data Warehousing Architecture Access Tools (cont.) Lecture 9 Application development tools - graphical data access environment EIS tools - high level summarisation OLAP - multidimensional DB Data mining tools Data visualisation tools - display complex relationships and patterns, techniques include 3-D imaging & sound, virtual reality Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 16 Data Warehousing Architecture Data Marts Lecture 9 Data store that is subsidiary to a data warehouse of integrated data It is created for the use of a dedicated group of users for a subject area It can be placed on the DW database In most instances, data mart is separated from the DW database and put on a separate database server Dependent data mart - data content is from the DW Independent data mart - alternative to the DW simple & inexpensive to build inconsistent - each has its own assumptions overlapping in data content, connectivity and management scalability problem Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 17 Data Warehousing Architecture Data Marts (cont.) Data integration issue - Ralph Kimball For any two data marts, common dimensions must conform to the equality or roll-up rule Lecture 9 Time Period Sales Products month week day Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 18 Data Warehousing Architecture Information Delivery System Lecture 9 It distributes data from warehouse to other DW and end-user products such as spreadsheets and local DBs (via Internet) Delivery is based on time or event Users receive report or an analytical view of data are not aware of location and maintenance Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 19 Data Warehousing Design issues - star schema Architecture Blueprint - mission, goals, objectives Logical Architecture Lecture 9 Data Architecture (data) Technology Architecture (hardware, software & network) Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] Application Architecture (tools) 20 Data Warehousing Design issues - star schema Star Schema Time Dimension Lecture 9 Product Dimension Other Dimension sales revenues Location Dimension Age Group Dimension It is used to model the data in a DW from decision-makers view of the business and operational aspects of the business It defines the join paths for accessing the facts of business It allows user to filter, aggregate, drill down & slice and dice the business fact Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 21 Data Warehousing Design issues - star schema Star Schema - 3 logical entities Measure entities – centre Lecture 9 Dimension entities – point Category (detail) entities - extended from point Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 22 Data Warehousing Design issues - star schema Measure entities – centre Lecture 9 Focus of the users’ query activity Factual information -> business intelligence Synonymous names are used - measures, analysis, indicators Quantitative data numerical information Data contained in measure entities grows large over time Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] Month 199901 199901 199901 … Branch ABC XYZ PQR Product COLA COLA COLA Sales forecast 200000 150000 125000 Sales actual 1900000 1550000 1050000 Variance -10000 50000 -20000 … 23 Data Warehousing Design issues - star schema Dimension entities – point Lecture 9 Allow users to browse measurement data from different angles - time, location, product ... Minimize the rows of data within a measure entity - filter All Location Canada France Germany Location dimension Country USA Eastern Area Western Area North-east Central South-east Region Region Region Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] Area Region 24 Data Warehousing Design issues - star schema Category (detail) entities - extended from point Lecture 9 Provide detailed information of a category within a dimension Textual/qualitative information CLIENT DETAILS All Clients (Dimension) Region Client State …. Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] CLIENT_KEY COMPANY_N AME ADDRESS POST_CODE CONTRY NAME PHONE 25 Data Warehousing Design issues - star schema Forming Star Schema: Star schema can be formed based on an information package, which is constructed during data gathering process Lecture 9 Category Dimension All Time Periods Year 5 Quarter 20 Month 60 All All Products All Age All Econ. All Locations Groups Classes Genders Country Classification Age Group Class Gender 20 8 8 10 3 Area Group 80 40 Region Product 400 200 District 2,000 Store 200,000 Measures/Facts: Forecast Sales, Budget Sales, Actual Sales, Forecast Variance (calc.), Budget Variance (calc.) Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 26 Data Warehousing Design issues - star schema Forming Star Schema: Define measure entity Lecture 9 lowest category within each dimension along with each of the measures/facts defines measure entity give a name to reflect the business purpose put in the centre of the star schema in a rectangle box Sales Analysis Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 27 Data Warehousing Design issues - star schema Forming Star Schema: Define dimension entity Lecture 9 each column of an information package defines the dimension entity place on the periphery of the star in a diamond shaped box consider relationship to the measure entity “measures based on dimension” Time Location Product Sales Analysis E-class Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] Age Gender 28 Data Warehousing Design issues - star schema Forming Star Schema: Define category entity Lecture 9 examine each individual cell of an information package to determine if it qualifies as a category detail entity category entities become extensions of dimension entities add to star schema in stop sign box Dimension All Time Periods Year 5 Quarter 60 Month 60 Category All Locations Country 20 Area 80 Region 400 District 2,000 Store 200,000 All Products Classification 8 Group 40 Product 200 All Age Groups Age Group 8 All Econ. Classes Class 10 All Genders Gender 3 Customer Product Store Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] BIS4227 - Online Database Systems 29 Data Warehousing Design issues - star schema Forming Star Schema: Tim e Lecture 9 Locatio n Product Details Produc t Sales Analysis Gende r Econ. Class Store details Ag e Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] Customer details 30 Data Warehouse Example: Operational Data Lecture 9 Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 31 Data Warehouse Example: Star Schema Lecture 9 Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 32 Data Warehouse Example: Summary Report Lecture 9 Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 33 Data Warehousing Relation with DM Lecture 9 They have the same purpose - decision support DW assembles, formats, and organises historical data to answer user query as it is - depends on content of DW DW will not attempt to extract further information neither will it predict trends and patterns from data DM will extract previously unknown and useful information as well as predict trends and patterns DM can be performed on DW and/or traditional DB DM: next lecture Dr. Nawaz Khan, School of Computing Science E-mail: [email protected] 34