Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Clusterpoint wikipedia , lookup
Data analysis wikipedia , lookup
Forecasting wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai • • • • • • • • • • • • • Introduction What is data warehouse ? Explanation of definition Data warehouse Vs. Operational Database Data warehouse architecture Back end tools Conceptual model Database design Warehouse servers Index structures Meta data Conclusion References 2 Introduction • Essential elements of decision support • Enables The Knowledge Worker to make better and faster decisions • Used in many industries like: – Manufacturing (for order shipment) – Retail (for inventory management) – Financial Services (claims and risk analysis) • Every major database vendor offers product in this area 3 What is Data Warehouse ? • A data warehouse is a “subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making” • Typically maintained separately from operational databases 4 Explanation of definition • Subject-Oriented: – Designed around subject such as customer, vendor, product and activity – Does not includes data that are not needed for Decision support system (DSS) • Integrated: – Most important feature – Consistent naming convention, measurement of variables and so forth – The data should be stored in single globally acceptable fashion 5 Explanation (continues…) • Time Varying: – All data in the warehouse should be accurate as of some moment in time – Data stored over a long time horizon (5 –10 years) – Key structure contains element of time (implicitly or explicitly) – Data once correctly recorded cant be updated • Non Volatile: – No Update of data allowed – only loading and access of data operations 6 Data Warehouse Vs. Operational Database Data Warehouse Operational Database user Knowledge worker Clerk, IT professional Function Decision support Day to day operations Data Historical,summarized, multidimensional, integrated Current, up-to-date, detailed Unit of work Complex query Short, simple transaction metric Query throughout, response Transaction throughput 7 Architecture • • • • • Data sourcing,migration,cleanup tools Meta data repository Data marts Data query, reporting, analysis and mining tools Data warehouse administration and management 8 Architecture (continues…) • Distributed Data warehouse – Load balancing, scalability,higher availability – Meta data replicated and centrally administrated – Too expansive • Data marts – Departmental subset focused on selected subjects – example: marketing department includes customer, sales and product tabels – Has own repository and administration – May lead to complex integration problems if not designed properly 9 Back end tools and Utilities • Data cleaning, loading, refreshing tools • Cleaning – Multiple source, possibility of errors – Example: replace string sex by gender • Loading – Building indices, sorting and making access paths – Large amount of data • Incremental loading • Only updated tuples are inserted ,Process hard to manage • Refresh – Propagating updates – When to refresh ? – Set by administrator depending on user needs and traffic 10 Conceptual Model and front end tools • Multi dimensional view – – – – Dimensions together uniquely determine the measure Example: Sales can be represented as city,product, data Each dimension is described by set of attribute Example: product consist of • Category of product • Industry of product • Year of introduction • Front end tools – Multi dimensional spreadsheet • Supports Pivoting-reorientation • Roll_up - summarized data • Drill_down - go from high level to low level summary 11 Database design • Two ways to represent Multi dimensional model – Star schema • Database consist of single fact table and single table for each dimension • Each tuples in fact table consist of pointer to each of dimension – Snowflake schema • Refinement over star schema • Dimensional hierarchy is explicitly represented by normalizing dimension tables 12 Warehouse Servers • Specialized SQL servers – Provides advanced query language and query processing support for SQL queries over star and snowflake schemas – Example: Redbrick • ROLAP – Between relational back end and client front end tools – Extend traditional relational servers to support multidimensional queries – Example: Microstratergy • MOLAP – Multidimensional storage engine – Direct mapping – Example: Essbase from Arbor Inc. 13 Index structures • Bit map indices – Use single bit to indicate specific value of attribute – Example: instead of storing eight characters to record “engineer” as skill of employee use single bit id# Name Skill 1000 John 1 • Join indices – Maintains the relationship between foreign key with its matching primary keys 14 Meta data and warehouse management • Its data about data • Used for building, maintain, managing and using data warehouse • Administrative meta data – Information about setting up and using warehouse • Business meta data – Business terms and definition • Operational meta data – Information collected during operation of warehouse 15 Conclusion • Data warehouse is the technology for the future. • data warehouse enables knowledge worker to make faster and better decisions 16 References • • • • • Inmon W. H.,Building the data warehouse www.olapcouncil.org www.pwp.starnetinc.com www.arborsoft.com Kimball, R. The data warehouse toolkit. 17