Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Data Warehousing From DBMS to Decision Support • DBMSs widely used to maintain transactional data • Attempts to use of these data for analysis, exploration, identification of trends etc. has led to Decision Support Systems. • Rapid Growth since mid 70’s • DBMSs vendors have answered this trend by adding new features to existing products • Rarely enough DBs for Decision Support • Trend towards Data Warehousing • Data Warehousing – consolidation of data from several databases which are in turn maintained by individual business units along with historical and summary information Characteristics of TPSs Characteristic OLTP Typical operation Update Level of analytical requirements Low Screens Unchanging Amount of data per transaction Small Data level Detailed Age of data Current Orientation Records TPS vs Decision Support OLTP Complex Analysis Information to support Historical information day-to-day service to analyze Data stored at transaction Data needs to be integrated level Database design: Normalized Database design: Denormalized, star schema MIS and Decision Support Ad hoc access Production platforms Operational reports • • • • Decision makers MIS systems provided business data Reports were developed on request Reports provided little analysis capability no personal ad hoc access to data Analyzing Data from Operational Systems ERP • Data structures are complex • Systems are designed for high performance and throughput • Data is not meaningfully represented • Data is dispersed • TPS systems unsuitable for intensive queries Production platforms Operational reports Data Extract Processing Operational systems Extracts Decision makers • End user computing offloaded from the operational environment • User’s own data Management Issues Operational systems Extracts Extract explosion • • • • Duplicated effort Multiple technologies Obsolete reports No metadata Decision makers Data Quality Issues • • • • • • • • • No common time basis Different calculation algorithms Different levels of extraction Different levels of granularity Different data field names Different data field meanings Missing information No data correction rules No drill-down capability From Extract to Warehouse DSS Internal and external systems • • • • Data warehouse Decision makers Controlled Reliable Quality information Single source of data Data Warehousing Architecture External Data Sources Visualisation Extract Clean Transform Load Metadata respository Refresh Operational Databases Data Warehouse Serves OLAP Data Mining Business Motivators • Provide superior services and products • Know the business • New products • Invest in customers • Retain customers • Invest in technology • Reinvent to face new challenges Centralised data warehouse Corporate datawarehouse Mainframe Corporate Financial Marketing Manufacturing Dis tribution Analyst Server Analyst Analyst Federated data warehouse Mainframe Corporate data warehouse Financial Analyst Marketing Analyst Manufacturing Analyst Dis tribution Analyst Tiered data warehouse Mainframe Tier 3 (detailed data) Tier 2 (summarized data) Tier 1 (highly s ummarized data) Corporate data warehouse Local data mart Works tation Analyst Data Warehouses Vs Data Marts Data Warehouse Data Mart Property Data Warehouse Data Mart Scope Enterprise Department Subjects Multiple Single-subject Data Source Many Few Size (typical) 100 GB to > 1 TB < 100 GB Implementation time Months to years Months End-user Access Tools • High performance is achieved by pre-planning the requirements for joins, summations, and periodic reports by end-users. • There are five main groups of access tools: – Data reporting and query tools – Application development tools – Executive information system (EIS) tools – Online analytical processing (OLAP) tools – Data mining tools Data Usage - $1000 questions Verification What is the average sale for in-store and catalog customers? What is the average high school GPA of students who graduate from college compared to those who do not? Discovery What is the best predictor of sales? What are the best predictors of college graduation? Need to complement RDBMS technology with a flexible, multidimensional view of data The Functionality of OLAP • Rotate and drill down • Create and examine calculated data • Determine comparative or relative differences. • Perform exception and trend analysis. • Perform advanced analytical functions The star structure Multidimensional Database Customer Store Model Store Time SALES Time FINANCE Product The data is found at the intersection of dimensions. Data Mining Data mining functions • Associations – 85 percent of customers who buy a certain brand of wine also buy a certain type of pasta • Sequential patterns – 32 percent of female customers who order a red jacket within six months buy a gray skirt • Classifying – Frequent customers are those with incomes about $50,000 and having two or more children • Clustering – Market segmentation • Predicting – predict the revenue value of a new customer based on that personal demographic variables