* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Warehousing
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
3D optical data storage wikipedia , lookup
Information privacy law wikipedia , lookup
Data vault modeling wikipedia , lookup
Data Warehousing On-Line Analytical Processing (OLAP) Tools • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques • Relational OLAP (ROLAP) – Traditional relational representation • Multidimensional OLAP (MOLAP) – Cube structure • OLAP Operations – Cube slicing–come up with 2-D view of data – Drill-down–going from summary to more detailed views – Roll-up – the opposite direction of drill-down – Reaggregation – rearrange the order of dimensions Slicing a data cube Summary report Example of drill-down Starting with summary data, users can obtain details for particular cells Drill-down with color added Excel’s Pivot Table • Data/Pivot Table – Drilldown, rollup, reaggregation Access Pivot Form Drill Down Data Warehouse • A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes – Subject-oriented: e.g. customers, employees, locations, products, time periods, etc. • Dimensions for data analysis – Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources – Time-variant: Can study trends and changes – Nonupdatable: Read-only, periodically refreshed • Data Mart: – A data warehouse that is limited in scope Need for Data Warehousing • Integrated, company-wide view of high-quality information (from disparate databases) • Separation of operational and informational systems and data (for improved performance) Generic two-level data warehousing architecture L T One, companywide warehouse E Periodic extraction data is not completely current in warehouse The ETL Process • • • • Capture/Extract Scrub or data cleansing Transform Load and Index ETL = Extract, transform, and load Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Static extract = capturing a snapshot of the source data at a point in time Incremental extract = capturing changes that have occurred since the last static extract Data Warehouse Design - Star Schema • Also called “dimensional model” • Fact table – contain detailed business data • Dimension tables – contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc. – A dimension is a term used to describe any category used in analyzing data, such as time, geography, and product line. Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions Dimension tables contain descriptions about the subjects of the business Star schema with sample data Example: Order Processing System CID Cname City OID ODate Rating SalesPerson Customer 1 Has M Order M Qty Has M Product Price PID Pname Star Schema Location Dimension LocationCode State City Can group by State, City Product Category CategoryID Description (Snowflake model) FactTable LocationCode PeriodCode Rating PID Qty Amount Product Dimension PID Pname CategoryID CustomerRating Dimension Rating Description Period Dimension PeriodCode Year Quarter From SalesDB to MyDataWarehouse • Extract data from SalesDB: – Create query to get the data – Download to MyDataWareHouse • File/Import/Save as Table • Data scrub/cleasing,and transform: – Transform City to Location – Transform Odate to Period • Load data to FactTable Figure 6-8 Bitmap index index organization Bitmap saves on space requirements Rows - possible values of the attribute Columns - table rows Bit indicates whether the attribute of a row has the values Figure 6-9 Join Indexes–speeds up join operations Data Mining and Visualization • Knowledge discovery using a blend of statistical, AI, and computer graphics techniques • Goals: – Explain observed events or conditions – Confirm hypotheses – Explore data for new or unexpected relationships • Techniques – – – – – – – – – Statistical regression Decision tree induction Clustering and signal processing Affinity Sequence association Case-based reasoning Rule discovery Neural nets Fractals • Data visualization–representing data in graphical/multimedia formats for analysis SQL GROUPING SETS • GROUPING SETS – SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS – GROUP BY GROUPING SETS(CITY,RATING,(CITY,RATING),()) – ORDER BY CITY; • Note: () indicates that an overall total is desired. SQL CUBE • Perform aggregations for all possible combinations of columns indicated. – SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS – GROUP BY CUBE(CITY,RATING) – ORDER BY CITY, RATING; SQL ROLLUP • The ROLLUP extension causes cumulative subtotals to be calculated for the columns indicated. If multiple columns are indicated, subtotals are performed for each of the columns except the far-right column. – SELECT CITY,RATING,COUNT(CID) FROM HCUSTOMERS – GROUP BY ROLLUP(CITY,RATING) – ORDER BY CITY, RATING