Download OLTP , OLAP, Datawarehousing and Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Transcript
OLTP , OLAP,
Datawarehousing and Mining
OLTP
• Online transaction processing, or OLTP,
refers to a class of systems that facilitate
and manage transaction-oriented
applications, typically for data entry,
storage and retrieval.
• They are designed to cover most of the
day to day operations of the organization
such as sales, purchasing, inventory,
manufacturing, payroll, accounting etc.
OLTP cont…
• OLTP is a methodology to provide end
users with access to large amounts of data
in an intuitive and rapid manner.
OLTP
• OLTP (On-line Transaction Processing) is
characterized by a large number of short on-line
transactions (INSERT, UPDATE, DELETE).
• The main emphasis for OLTP systems is put on very fast
query processing, maintaining data integrity in multiaccess environments and an effectiveness measured by
number of transactions per second.
• In OLTP database there is detailed and current data, and
schema used to store transactional databases is the
entity model(Based on codd rules).
Queries which OLTP can Handle
•
•
•
•
•
•
Search for a particular Customer
Retrieve an Employee Record
Display all products near expiry
Filter All product according to a supplier
Display reservation Info
And so on…..
Advantages of OLTP
• Simplicity : for non technical users
• Efficiency : Quick Read, Write and Update
• Fast Query Processing
OLTP Characteristics….
•
•
•
•
•
•
Current data
Short database transactions
Online update/insert/delete
Normalization is required
High volume transactions
Transaction recovery is necessary
Issues and Challenges
• Security
• Large Data base
• Not suitable for Decision making and
Strategic Information.
OLTP cannot Answer…
•
•
•
•
•
•
Which New Product to launch?
How much discount to offer?
What would be the projected sales?
Effect of removing a product?
Who is the Best performing Employee?
Competitive Analysis
Strategic Decisions
• All Managers and CEO’s want decision
making information or Strategic
Information.
• They want the answers of many questions:
– As In previous slide
Some Key decisions…
• Retail
– Customer Loyalty
– Market Planning
• Financial
– Customer Loyalty
– Fraud Detection
• Airlines
– Route Profitability
– Crew Management
• Manufacturing
– Cost Reduction
– Logistics Management
• Governance
– Manpower Planning
– Cost Control
• Planning
– Asset Management
– Resource Management
Some Business Objectives..
• Retain the customer base (Mobile
Companies)
• Increase Customer base
• Gain Market Share
• Improve Quality
• Enhance Customer Service
• Increase sales in north by 15%
Information Crisis
• Enterprise has lot of data
• OLTP cannot process data to produce strategic Info
– Companies are not lacking in data but the software they are
having are not able to process data in order to deliver strategic
Info.
• Managers want to review data from many point of view
–
–
–
–
–
–
By Product
By Salesperson
By City
By Month
By Consumer Group
And many more
Historic Decision Support Systems
•
•
•
•
•
•
Ad Hoc Reports
Special Extract Programs
Small Applications
Data Centers
Decision Support Software (Menu Driven)
Executive Information System
OLAP
– Analysis of information in a database for the
purpose of making management decisions
– It also refers to the technology used to
perform complex analysis of the data in
a data warehouse
Purpose of OLAP
• To derive summarized information from
large volume database
• To generate automated reports for human
view
OLAP Cont…
• OLAP (online analytical processing) is computer
processing that enables a user to easily and
selectively extract and view data from different
points of view.
• For example, a user can request that data be
analyzed to display a spreadsheet showing all of
a company's sports products sold in Delhi in the
month of July, compare revenue figures with
those for the same products in September, and
then see a comparison of other product sales in
Delhi in the same time period.
Formal Definition
– OLAP is a category of software technology
that enables analysts, managers, and
executives to gain insight into data through
fast, consistent, interactive access to a wide
variety of possible views of information that
has been transformed from raw data to reflect
the dimensionality of the enterprise as
understood by the user.
OLAP
• OLAP (On-line Analytical Processing) is
characterized by relatively low volume of
transactions. Queries are often very complex
and involve aggregations.
• For OLAP systems a response time is an
effectiveness measure. OLAP applications are
widely used by Data Mining techniques.
• In OLAP database there is aggregated, historical
data, stored in multi-dimensional schemas
(usually star schema).
OLAP
• Analyzes historical data (terabytes) using
complex queries
• Due to volume of data and complexity of
queries, OLAP often uses a data warehouse
• Data Warehouse - (offline) repository of
historical data generated from OLTP or other
sources
• Data Mining - use of warehouse data to
discover relationships that might influence
enterprise strategy
Examples - Supermarket
• OLTP
– Event is 3 cans of soup and 1 box of crackers
bought; update database to reflect that event
• OLAP
– Last winter in all stores in northeast, how many
customers bought soup and crackers together?
• Data Mining
– Are there any interesting combinations of foods that
customers frequently bought together?
OLAP Adv & Benefits…
• Enables Analysts, Executives and Managers to
gain useful insight from the presentation of data
• Allows data to be views from several dimensions
and perspectives
• Support Multidimensional Analysis
• Ability to drill down or roll up dimensions to
uncover significant aspects of data
• Capable of applying mathematical formulas and
statistical calculations
Benefits..
• Provides Fast response
• Supports Data Mining
• Highly Interactive Analysis
Benefits..
• What if Analysis
• Single Platform for all business needs
• Easily navigate multidimensional data to
perform unpredictable ad hoc queries and
to display the results in a variety of
interesting layouts
• Transparently manage business rules
across dimensions and cubes
Why need OLAP over Relational
Database I
• Consistently fast response
• OLAP obtains a consistently fast response
is by prestoring calculated values
Why need OLAP over Relational
Database II
• Metadata-based queries
• provide analysis functions that are difficult
or impossible to express in SQL
• SQL was developed primarily for
transaction systems, not for reporting
applications
Why need OLAP over Relational
Database III
• Spreadsheet-style formulas
• design the data structure with users in
mind.
• Spreadsheets are key components of
business management because they are
intuitive to create
Some Complex OLAP Ques?
• How do sales for our top 10 most profitable products
across Europe for this quarter compare with sales a year
ago?
• What are the differences in the product-sales mix
between the regions, relative to the global sales mix?
• What are our forecast units, unit price per service, unit
cost per product, sales, cost trends, and profit for the
next 12 months?
• In what ways does the mix vary by salesperson, and
what is the relative performance of our salespeople?
• What are the products making up 40% of our profit for
each region over time?
• These questions are more analytical and complex, and
the answer to one question often leads immediately to
another question as the user follows a train of thought in
researching a business problem or opportunity.
OLAP is designed to make it easy for end users to ask
these types of analytical questions without requiring:
– Assistance from the IT department
– Programming skills
– Technical knowledge about the organization of the database
• The results of queries also need to be rapid so that the
analyst’s train of thought is not interrupted and the value
of the analysis is not diminished.
• A typical multidimensional business query, would be
something like the following:
For each region of the world, what was the percentage
change in revenue for our top 20% products, over a
rolling three-month period this year compared to the
same period last year?
This simple business question describes both the data
that the user wants to examine and the structural form of
that data. Business users typically want to answer
questions that include terms such as what, where, who,
and when.
• For example, you find the following essential questions
embedded in the sample question:
• What products are selling best? (“…top 20%…”)
• Where are they selling? (“…each region of the world…”)
• When have they performed the best? (“…percentage
change in revenue…”)
• If you examine the query in detail it appears to be
translate into a very complex query. When I have
discussed this type of query during presentations at
conferences and with customers you can always spot
the DBAs because they immediately try to translate this
question into a SQL statement. Of course it is possible to
create a SQL query to answer this question.
• There are two calculations (percentage change in
revenue and rolling three-month total).
• There is a ranking element (the top 20% of products).
• There are aggregations (region level of the geography
dimension).
• There are multidimensional selections (specific products,
specific time periods, and specific regions).
• The result of the query is a multidimensional view of the
data (perhaps as a tabular display on the screen,
perhaps as a graph, or both).
• Now I think many people would agree this does in fact
look like a complex query. To try and frame this within a
traditional query and reporting tool would require
considerable skill. And don’t forget this is the starting
point for the analysis not the final result. The data
returned by this query will drive other even more
interesting and complex queries.
But the complexity of this query is a technical issue
rather than a business issue.
Therefore, a key goal of OLAP technology is to make it
very easy for end users to ask such questions about
their data without placing a burden on the IT department.
How Does OLAP Make This
Easy?
• Business users think dimensionally. By
design, OLAP technology stores,
processes, and presents data in a
dimensional way. The data model of OLAP
systems reflects the users’ picture of their
business data, making it easy to formulate
queries in business terms.
Key Component
• Multidimensional Data
– One Dimensional
– Two Dimensional
– Three Dimensional
– N – Dimensional