Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
OLTP , OLAP, Datawarehousing and Mining OLTP • Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry, storage and retrieval. • They are designed to cover most of the day to day operations of the organization such as sales, purchasing, inventory, manufacturing, payroll, accounting etc. OLTP cont… • OLTP is a methodology to provide end users with access to large amounts of data in an intuitive and rapid manner. OLTP • OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). • The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multiaccess environments and an effectiveness measured by number of transactions per second. • In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model(Based on codd rules). Queries which OLTP can Handle • • • • • • Search for a particular Customer Retrieve an Employee Record Display all products near expiry Filter All product according to a supplier Display reservation Info And so on….. Advantages of OLTP • Simplicity : for non technical users • Efficiency : Quick Read, Write and Update • Fast Query Processing OLTP Characteristics…. • • • • • • Current data Short database transactions Online update/insert/delete Normalization is required High volume transactions Transaction recovery is necessary Issues and Challenges • Security • Large Data base • Not suitable for Decision making and Strategic Information. OLTP cannot Answer… • • • • • • Which New Product to launch? How much discount to offer? What would be the projected sales? Effect of removing a product? Who is the Best performing Employee? Competitive Analysis Strategic Decisions • All Managers and CEO’s want decision making information or Strategic Information. • They want the answers of many questions: – As In previous slide Some Key decisions… • Retail – Customer Loyalty – Market Planning • Financial – Customer Loyalty – Fraud Detection • Airlines – Route Profitability – Crew Management • Manufacturing – Cost Reduction – Logistics Management • Governance – Manpower Planning – Cost Control • Planning – Asset Management – Resource Management Some Business Objectives.. • Retain the customer base (Mobile Companies) • Increase Customer base • Gain Market Share • Improve Quality • Enhance Customer Service • Increase sales in north by 15% Information Crisis • Enterprise has lot of data • OLTP cannot process data to produce strategic Info – Companies are not lacking in data but the software they are having are not able to process data in order to deliver strategic Info. • Managers want to review data from many point of view – – – – – – By Product By Salesperson By City By Month By Consumer Group And many more Historic Decision Support Systems • • • • • • Ad Hoc Reports Special Extract Programs Small Applications Data Centers Decision Support Software (Menu Driven) Executive Information System OLAP – Analysis of information in a database for the purpose of making management decisions – It also refers to the technology used to perform complex analysis of the data in a data warehouse Purpose of OLAP • To derive summarized information from large volume database • To generate automated reports for human view OLAP Cont… • OLAP (online analytical processing) is computer processing that enables a user to easily and selectively extract and view data from different points of view. • For example, a user can request that data be analyzed to display a spreadsheet showing all of a company's sports products sold in Delhi in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Delhi in the same time period. Formal Definition – OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the dimensionality of the enterprise as understood by the user. OLAP • OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. • For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. • In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema). OLAP • Analyzes historical data (terabytes) using complex queries • Due to volume of data and complexity of queries, OLAP often uses a data warehouse • Data Warehouse - (offline) repository of historical data generated from OLTP or other sources • Data Mining - use of warehouse data to discover relationships that might influence enterprise strategy Examples - Supermarket • OLTP – Event is 3 cans of soup and 1 box of crackers bought; update database to reflect that event • OLAP – Last winter in all stores in northeast, how many customers bought soup and crackers together? • Data Mining – Are there any interesting combinations of foods that customers frequently bought together? OLAP Adv & Benefits… • Enables Analysts, Executives and Managers to gain useful insight from the presentation of data • Allows data to be views from several dimensions and perspectives • Support Multidimensional Analysis • Ability to drill down or roll up dimensions to uncover significant aspects of data • Capable of applying mathematical formulas and statistical calculations Benefits.. • Provides Fast response • Supports Data Mining • Highly Interactive Analysis Benefits.. • What if Analysis • Single Platform for all business needs • Easily navigate multidimensional data to perform unpredictable ad hoc queries and to display the results in a variety of interesting layouts • Transparently manage business rules across dimensions and cubes Why need OLAP over Relational Database I • Consistently fast response • OLAP obtains a consistently fast response is by prestoring calculated values Why need OLAP over Relational Database II • Metadata-based queries • provide analysis functions that are difficult or impossible to express in SQL • SQL was developed primarily for transaction systems, not for reporting applications Why need OLAP over Relational Database III • Spreadsheet-style formulas • design the data structure with users in mind. • Spreadsheets are key components of business management because they are intuitive to create Some Complex OLAP Ques? • How do sales for our top 10 most profitable products across Europe for this quarter compare with sales a year ago? • What are the differences in the product-sales mix between the regions, relative to the global sales mix? • What are our forecast units, unit price per service, unit cost per product, sales, cost trends, and profit for the next 12 months? • In what ways does the mix vary by salesperson, and what is the relative performance of our salespeople? • What are the products making up 40% of our profit for each region over time? • These questions are more analytical and complex, and the answer to one question often leads immediately to another question as the user follows a train of thought in researching a business problem or opportunity. OLAP is designed to make it easy for end users to ask these types of analytical questions without requiring: – Assistance from the IT department – Programming skills – Technical knowledge about the organization of the database • The results of queries also need to be rapid so that the analyst’s train of thought is not interrupted and the value of the analysis is not diminished. • A typical multidimensional business query, would be something like the following: For each region of the world, what was the percentage change in revenue for our top 20% products, over a rolling three-month period this year compared to the same period last year? This simple business question describes both the data that the user wants to examine and the structural form of that data. Business users typically want to answer questions that include terms such as what, where, who, and when. • For example, you find the following essential questions embedded in the sample question: • What products are selling best? (“…top 20%…”) • Where are they selling? (“…each region of the world…”) • When have they performed the best? (“…percentage change in revenue…”) • If you examine the query in detail it appears to be translate into a very complex query. When I have discussed this type of query during presentations at conferences and with customers you can always spot the DBAs because they immediately try to translate this question into a SQL statement. Of course it is possible to create a SQL query to answer this question. • There are two calculations (percentage change in revenue and rolling three-month total). • There is a ranking element (the top 20% of products). • There are aggregations (region level of the geography dimension). • There are multidimensional selections (specific products, specific time periods, and specific regions). • The result of the query is a multidimensional view of the data (perhaps as a tabular display on the screen, perhaps as a graph, or both). • Now I think many people would agree this does in fact look like a complex query. To try and frame this within a traditional query and reporting tool would require considerable skill. And don’t forget this is the starting point for the analysis not the final result. The data returned by this query will drive other even more interesting and complex queries. But the complexity of this query is a technical issue rather than a business issue. Therefore, a key goal of OLAP technology is to make it very easy for end users to ask such questions about their data without placing a burden on the IT department. How Does OLAP Make This Easy? • Business users think dimensionally. By design, OLAP technology stores, processes, and presents data in a dimensional way. The data model of OLAP systems reflects the users’ picture of their business data, making it easy to formulate queries in business terms. Key Component • Multidimensional Data – One Dimensional – Two Dimensional – Three Dimensional – N – Dimensional