* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2.JiaoDaCube
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Clusterpoint wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Cube Intro Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Needs supports Goals – needs to be specific Not to hit an iceberg – for Titanic or other boats Measures – needs to be concrete Away from any visible iceberg Feedback – needs to be timely and accurate We hit an iceberg, rather than saw one – not timely enough The boat can sink, actually, again too late to find out BI – Business Intelligence Is the delivery of accurate, useful information to the appropriate decision makers within necessary timeframe to support effective decision making It is a superset of what can be provided by data mining SQL Server 2014 is a decent tool for this and support both Cubes/Data warehouse and Data mining OLAP/Cube Term coined in mid-1990’s when database people needed a new research direction Main goal: support ad-hoc but complex querying performed by business analysts Interactive process of creating, managing, analyzing and reporting on data Extends spreadsheet-like analysis to work with huge amounts of data in a data warehouse Known vs. Unknown Questions When using SQL statements or even Cubes, we know the question and can formulate the SQL statements (most of the time) to dig out answers to the questions Layout let answers – SQL Data-led answers – the drill down in a cube dimension (if NW sales figure is low, which state is the main contributor?) When we do not know what to ask, we use data mining – book’s definition is less accurate What should be the sources OLTP is the source On line transaction Process These are the things we have learned in database classes But OLTP, the direct data store is not suitable for BI because Not to fight for computing resources of OLTP systems OLTP may not have all the history readily available BI may have to use data from several OLTP systems The solution is generally Data Mart Data marts Data marts are related sets of data that are grouped together and separated out from the main body of data. It can be a repository of OLTP data or portion of data from a data warehouse Data in a Data Mart are not meant for supporting real time operations Data in a Data mart are generally have gone through the data clearing process More on OLAP Place key performance indicators (measures) into context (dimensions) measures are pre-aggregated data retrieval is significantly faster modeled in a DW The processed “cube” is made available to business analysts who can browse the data using a variety of tools, making ad hoc interactive and analytical processing OLAP versus OLTP OLAP versus OLTP Measures and dimensions Measures: key performance indicators, such as sales amount, that you would like to evaluate typically numerical, including volume, sales, and costs a rule of thumb: if a number makes (business/social) sense when aggregated, then it is a measure examples: aggregate daily volume to month, quarter and year aggregating telephone numbers would not make sense therefore, telephone numbers are not measures postcode: not a measure, but can be a dimension (?) Totals in all the transactions would be a measure Number of high school students by school, county, state should be a measure Measures and dimensions Dimensions: categories of data analysis typical dimensions include product, time, region a rule of thumb: when a report is requested to be ordered "by“ something, that something is usually a dimension e.g. in sales report: view sales by month, by region, so the two dimensions needed are time and region Star Schema Dimensions and measures are physically represented by a star schema (typically) arrange the dimension tables around a central fact table that contains the measures a fact table contains a column for each measure as well as a column for each dimension Example Dimension tables Market (Store_ID, City, Region) Product (Product_ID, Name, Category, Price) Time (Time_ID, Week, Month, Quarter, Year) Fact table Sales (Store_ID, Product_ID, Time_ID, Amount) Notes Amount should be the measure Time_ID is preprocessed to show at week or month level This can be modified to show Snowflakes Example supermarket sales roll-up: summaries over all cities, regions slice the cube to select sales only in day 3 dice the cube to select sales only in week 2 (days 8-14), and group by regions The TWO Multi-dimensional model What we meant mostly when referring a cube or a data warehouse Tabular model Complicated in SQL Server May be accomplished easily with other tools such as vertica SSMS and SSDT SSDT (SQL Server Data Tool), used to be the BI studio, is really the Visual Studio 2013 Give a demo