Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INFO5990 Professional Practice in IT Lecture 12B: Online Analytical Processing 1. Business Intelligence (BI) Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making. ("Forrester Research" market research company) Evolved from the decision support systems of the 1960s to mid-1980s (compare with computer development during these years) Key components of a BI system 1. 2. 3. 4. a set of business performance indicators for monitoring and analyzing performance a data warehouse containing both internal and external data business analytic tools for manipulating, mining, and analyzing data a user interface 2. Online analytical processing (OLAP) An approach to answering ad hoc multidimensional analytical queries. It is part of the broader category of business intelligence, encompassing reporting and data mining, for example sales and marketing reporting, management reporting, budgeting and forecasting, financial reporting. OLAP tools enable users to interactively analyze multidimensional data from multiple perspectives. Purpose built databases configured for OLAP incorporate a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. The data complex is referred to as a data cube, or data hypercube, since in reality it includes many dimensions. 2.1 Dimensions, hierarchies and measures If a company wishes to summarize financial data by product, by time-period, and by city, and to compare actual and budget expenses, then product, time, city and scenario (actual and budget) are the dimensions of the data. If we consider data about shoes, then store, category, make and size, represents a hierarchy, each level being a ‘child’ of the previous parent. Each value stored, such sales, profits, expenses, budget and forecast, quantity on hand, number sold, price, salesperson Id, is called a measure. Lecture 12B OLAP OLAP data is typically stored in a star schema or snowflake schema in a relational ‘data warehouse’. a special-purpose data management system. Measures are determined from records in the fact table and dimensions are derived from the dimension tables. 2.2 OLAP processes OLAP includes three basic analytical operations: Slicing and dicing: users can take out a specific set of data of the OLAP cube (slicing) or view some sub-cube (dicing) from different viewpoints. Drill-down: allows users to retrieve the detail underlying particular values. For instance, users can view the sales by individual products that make up a region’s sales. Roll up (Consolidation): involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. A fourth process is referred to as Pivoting, i.e. rotating the cube in space to see a different ‘face’. 2.3 OLAP data is not normalised Unlike the data in operational databases, which are normalised in order to remove redundancies and data duplication, the data in a data cube is not normalised, and is stored so as to permit the rapid, flexible processing necessary to answer the sort of ad hoc queries that managers might ask, such as “are we selling more Doc Martin boots at the Newtown store this winter than we did last year?”, “Which sales person is responsible?” 3. ETL tools: extract - transform - load ETL tools provide a mechanism for integrating data from a range of diverse data sources into a single data cube, often called data warehousing. This involves: • • • Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) G Kennedy 2013 ENGG1805 3.1 Extract • Each separate system may also use a different data organization/format. The extracted data is parsed to see if it meets the expected pattern or structure. If not, it will be rejected. • • • In some cases extracted data must be processed to remove ‘bad’ values (cleansing) or to transform it into an appropriate form for the target data warehouse. • • • Cleansing: removing ‘bad’ values e.g. wrong postcodes, non-numeric values Selecting: e.g. ignoring records where salary is not present (salary = null). Translating : e.g. if the source system stores 1 for male and 2 for female, but the warehouse stores M and F Encoding: e.g. mapping “Male” to “1” Question 1 ABCDE Question 2 ABCDE Deriving new value: e.g. sale_amount = qty * unit_price Sorting Removing duplicate values Validating: e.g. checking product codes 3.3 Load 3.2 Transform • Professional Engineering and IT Question 3 ABCDE Data is loaded into the end target, usually the data warehouse. It may overwrite existing data with latest values or create new values, such as latest sales. More complex systems can maintain a history and audit trail of all changes to the data. Extra data, such as surrogate keys sometimes have to be provided, e.g. where an organisation may have on customer information in several departments where each department has the information listed in different ways, e.g. by name, or by customer by number. The ETL tool can consolidate it into a uniform presentation giving the records a unique identifier. Question 4 ABCDE Question 5 ABCDE Question 6 ABCDE Score / 6