Download Lecture 12B: Online Analytical Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Data Protection Act, 2012 wikipedia , lookup

Data model wikipedia , lookup

Data center wikipedia , lookup

Database model wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Information privacy law wikipedia , lookup

Business intelligence wikipedia , lookup

Transcript
INFO5990 Professional Practice in IT
Lecture 12B: Online Analytical Processing
1. Business Intelligence (BI)
Business Intelligence is a set of methodologies,
processes, architectures, and technologies that
transform raw data into meaningful and useful
information used to enable more effective strategic,
tactical, and operational insights and
decision-making.
("Forrester Research" market research company)
Evolved from the decision support systems of the
1960s to mid-1980s (compare with computer
development during these years)
Key components of a BI system
1.
2.
3.
4.
a set of business performance indicators for
monitoring and analyzing performance
a data warehouse containing both internal and
external data
business analytic tools for manipulating,
mining, and analyzing data
a user interface
2. Online analytical processing (OLAP)
An approach to answering ad hoc multidimensional analytical queries. It is part of the
broader category of business intelligence,
encompassing reporting and data mining, for
example sales and marketing reporting,
management reporting, budgeting and forecasting,
financial reporting.
OLAP tools enable users to interactively analyze
multidimensional data from multiple perspectives.
Purpose built databases configured for OLAP
incorporate a multidimensional data model,
allowing for complex analytical and ad-hoc queries
with a rapid execution time. The data complex is
referred to as a data cube, or data hypercube, since
in reality it includes many dimensions.
2.1 Dimensions, hierarchies and
measures
If a company wishes to summarize financial data
by product, by time-period, and by city, and to
compare actual and budget expenses, then product,
time, city and scenario (actual and budget) are
the dimensions of the data.
If we consider data about shoes, then store,
category, make and size, represents a hierarchy,
each level being a ‘child’ of the previous parent.
Each value stored, such sales, profits, expenses,
budget and forecast, quantity on hand, number
sold, price, salesperson Id, is called a measure.
Lecture 12B OLAP
OLAP data is typically stored in a star schema or
snowflake schema in a relational ‘data
warehouse’. a special-purpose data management
system.
Measures are determined from records in the fact
table and dimensions are derived from the
dimension tables.
2.2 OLAP processes
OLAP includes three basic analytical operations:
Slicing and dicing: users can take out a specific
set of data of the OLAP cube (slicing) or view
some sub-cube (dicing) from different
viewpoints.
Drill-down: allows users to retrieve the detail
underlying particular values. For instance, users
can view the sales by individual products that
make up a region’s sales.
Roll up (Consolidation): involves the
aggregation of data that can be accumulated and
computed in one or more dimensions. For
example, all sales offices are rolled up to the
sales department or sales division to anticipate
sales trends.
A fourth process is referred to as Pivoting, i.e.
rotating the cube in space to see a different
‘face’.
2.3 OLAP data is not normalised
Unlike the data in operational databases, which
are normalised in order to remove redundancies
and data duplication, the data in a data cube is
not normalised, and is stored so as to permit the
rapid, flexible processing necessary to answer
the sort of ad hoc queries that managers might
ask, such as “are we selling more Doc Martin
boots at the Newtown store this winter than we
did last year?”, “Which sales person is
responsible?”
3. ETL tools: extract - transform - load
ETL tools provide a mechanism for integrating
data from a range of diverse data sources into a
single data cube, often called data warehousing.
This involves:
•
•
•
Extracting data from outside sources
Transforming it to fit operational needs
(which can include quality levels)
Loading it into the end target (database,
more specifically, operational data store,
data mart or data warehouse)
G Kennedy 2013
ENGG1805
3.1 Extract
•
Each separate system may also use a different data
organization/format. The extracted data is parsed to
see if it meets the expected pattern or structure. If
not, it will be rejected.
•
•
•
In some cases extracted data must be processed to
remove ‘bad’ values (cleansing) or to transform it
into an appropriate form for the target data
warehouse.
•
•
•
Cleansing: removing ‘bad’ values e.g.
wrong postcodes, non-numeric values
Selecting: e.g. ignoring records where
salary is not present (salary = null).
Translating : e.g. if the source system
stores 1 for male and 2 for female, but the
warehouse stores M and F
Encoding: e.g. mapping “Male” to “1”
Question 1
ABCDE
Question 2
ABCDE
Deriving new value: e.g. sale_amount =
qty * unit_price
Sorting
Removing duplicate values
Validating: e.g. checking product codes
3.3 Load
3.2 Transform
•
Professional Engineering and IT
Question 3
ABCDE
Data is loaded into the end target, usually the
data warehouse. It may overwrite existing data
with latest values or create new values, such as
latest sales. More complex systems can maintain
a history and audit trail of all changes to the data.
Extra data, such as surrogate keys sometimes
have to be provided, e.g. where an organisation
may have on customer information in several
departments where each department has the
information listed in different ways, e.g. by
name, or by customer by number. The ETL tool
can consolidate it into a uniform presentation
giving the records a unique identifier.
Question 4
ABCDE
Question 5
ABCDE
Question 6
ABCDE
Score / 6