Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Protection Act, 2012 wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
3D optical data storage wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
Business intelligence wikipedia , lookup
IIO30120 Database Design Data warehouse design based on the book Hovi, Huotari, Lahdenmäki: Tietokantojen suunnittelu & indeksointi, Docendo (2003, 2005) chapter 8 © Jouni Huotari & Ari Hovi 18.10.2015 Some slides from http://www.slideshare.net/idnats/datawarehousing-and-data-mining-presentation-725476 For discussion • What is a data warehouse? • How the term Business Intelligence relates to data warehousing? • How data mart differs from data warehouse? • Why OLAP (online analytical processing) is important? • What is the basic idea behind ETL? 18.10.2015 © Jouni Huotari & Ari Hovi 2 What is Data Warehouse? • Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis. • “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon • Data warehousing: The process of constructing and using data warehouses Basic design principles for data warehouses To support fast summary queries, analysis, and reporting • This is often difficult in operative databases, but some solutions exist • can you mention any solutions? It is important to • Maintain history for seeing trends etc. • Clean up and make data consistent from different data sources • Make the structure clear and understandable 18.10.2015 © Jouni Huotari & Ari Hovi 4 Example: a producer wants to know…. Which are our lowest/highest margin customers ? What is the most effective distribution channel? What product prom-otions have the biggest impact on revenue? 18.10.2015 5 Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? back room and front room of a data warehouse 18.10.2015 © Jouni Huotari & Ari Hovi © Kimball, Caserta: The Data Warehouse 6ETL Toolkit (2004) Data Warehouse vs. Operational DBMS • OLTP (on-line transaction processing) – Major task of traditional relational DBMS – Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. • OLAP (on-line analytical processing) – Major task of data warehouse system – Data analysis and decision making • Distinct features (OLTP vs. OLAP): – User and system orientation: customer vs. market – Data contents: current, detailed vs. historical, consolidated – Database design: ER + application vs. star + subject – View: current, local vs. evolutionary, integrated – Access patterns: update vs. read-only but complex queries 7 Conceptual Modeling of Data Warehouses • Star schema: A fact table in the middle connected to a set of dimension tables • Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake • Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation • Dimensions describe who, what, when, where and why for the facts. 8 Octobe r 18, Example of Star Schema Another example of Star Schema time time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key units_sold branch_key branch_name branch_type dollars_sold avg_sales Measures item_key item_name brand type supplier_type branch_key location_key branch 10 item location location_key street city state_or_province country Octobe r 18, Example of Snowflake Schema time time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key branch_key location_key branch branch_key branch_name branch_type Measures 11 item units_sold dollars_sold avg_sales item_key item_name brand type supplier_key location supplier supplier_key supplier_type location_key street city city_key city_key city state_or_province country Octobe r 18, Example of Fact Constellation time time_key day day_of_the_week month quarter year branch item Sales Fact Table time_key item_key branch_key location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales Measures 12 item_key item_name brand type supplier_type Shipping Fact Table time_key item_key shipper_key from_location location location_key street city province_or_state country to_location dollars_cost units_shipped shipper shipper_key shipper_name Octobe location_key r 18, Multidimensional Data • Sales volume as a function of product, month, and region •Dimensions: Product, Location, Time •Hierarchical summarization paths Industry Region Year Product Category Country Quarter Product City Office 13 Month Week Day Month Octobe r 18, A Sample Data Cube 2Qtr 3Qtr 4Qtr sum U.S.A Canada Mexico sum Country TV PC VCR sum 1Qtr Date Total annual sales of TV in U.S.A. Browsing a Data Cube • Visualization • OLAP capabilities • Interactive manipulation 15 Exercises • Install http://www.olapcube.com/ to your virtual machine and play with the dimensions, create a cube and examine the result (dashboard) • http://www.olapcube.com/help/writer/panorama/ • http://www.pentaho.com/testdrive • http://www.databaseanswers.org/downloads/Data_Ware housing_by_Example.pdf • Example (real) data: http://www.gapminder.org/data/ 18.10.2015 © Jouni Huotari & Ari Hovi 16 What is Business Intelligence (BI)? • BI refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context (Wikipedia) • The goal is to gain insight into the business by bringing together data, formatting it in a way that enables better analysis, and then providing tools that give users power— not just to examine and explore the data, but to quickly understand it. (Business Intelligence with Microsoft Office PerformancePoint Server) 18.10.2015 17 More information from wikipedia: • http://en.wikipedia.org/wiki/Data_warehouse • http://en.wikipedia.org/wiki/Data_mart • http://en.wikipedia.org/wiki/Star_schema • http://en.wikipedia.org/wiki/Snowflake_schema • http://en.wikipedia.org/wiki/OLAP 18.10.2015 © Jouni Huotari & Ari Hovi 18