* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data cleaning, which
Survey
Document related concepts
Transcript
Data Warehousing What is Data Warehouse, Multidimensional Data Model, Data Warehouse Architecture, Data warehouse Implementation, From Data Warehouse to Data mining. May 7, 2017 Data Mining: Concepts and Techniques 1 What is Data Warehouse? A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis. Data warehousing provides architectures and tools for business executives to systematically organize, understand and use their data to make strategic decisions Data warehousing:The process of constructing and using data warehouses May 7, 2017 Data Mining: Concepts and Techniques 2 “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—William. H. Inmon The four keywords subject-oriented, integrated, time-variant and nonvolatile, distinguish data warehouses from other data repository systems such as relational database systems, transaction processing systems, and file systems. May 7, 2017 Time- variant: data are stored to provide information from a historical perspective. Non- volatile: a data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Data Mining: Concepts and Techniques 3 Data Warehouse—Subject-Oriented Subject-oriented: a data warehouse is organized around major subjects. Organized around major subjects, such as customer, product, sales Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process May 7, 2017 Data Mining: Concepts and Techniques 4 Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources May 7, 2017 E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted. Data Mining: Concepts and Techniques 5 Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems Operational database: current value data Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse May 7, 2017 Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element” Data Mining: Concepts and Techniques 6 Data Warehouse—Nonvolatile A physically separate store of data transformed from the operational environment Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: May 7, 2017 initial loading of data and access of data Data Mining: Concepts and Techniques 7 Differences between Operational Database Systems and Data Warehouses The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing(OLTP)systems. Data warehouse systems, on the other hand, serve users or knowledge workers in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing(OLAP) systems. May 7, 2017 Data Mining: Concepts and Techniques 8 Differences between Operational Database Systems and Data Warehouses 1) 2. Users and System orientation : An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients and information technology professionals. Am OLAP system is market-oriented and is used for data analysis by knowledge workers , including managers , executives and analysts. Data Contents : An OLTP system current data that , typically, are too detailed to be easily used for decision making An OLAP system manages large amount of historical data. May 7, 2017 Data Mining: Concepts and Techniques 9 Differences between Operational Database Systems and Data Warehouses 3. Database Design: An OLTP system usually adopts an entity-relation ship (ER) data model and an application oriented database design. An OLAP system typically adopts either a star or snowflake model and subject oriented data base design. May 7, 2017 Data Mining: Concepts and Techniques 10 Differences between Operational Database Systems and Data Warehouses 4. View: An OLTP system focuses mainly on the current data within an enterprise or department , without referring to historical data in different. An OLAP system often spans multiple versions of a database schema , due to the evolutionary process of an organization. OLAP – deal with information that originate from different organizations , integrating information from many data stores. Because of their huge volume , OLAP data are stored on multiple storage media. May 7, 2017 Data Mining: Concepts and Techniques 11 Differences between Operational Database Systems and Data Warehouses 5 Access Patterns – OLTP – Short atomic transactions. ( System requires concurrency control and recovery mechanisms) OLAP – Read only operations ( Historical rather than up to date information) . Complex queries. May 7, 2017 Data Mining: Concepts and Techniques 12 OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated repetitive historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans unit of work read/write index/hash on prim. key short, simple transaction # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response usage access May 7, 2017 complex query Data Mining: Concepts and Techniques 13 Multidimensional Data Model A data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube. May 7, 2017 Data Mining: Concepts and Techniques 14 May 7, 2017 Data Mining: Concepts and Techniques 15 May 7, 2017 Data Mining: Concepts and Techniques 16 May 7, 2017 Data Mining: Concepts and Techniques 17 Cube: A Lattice of Cuboids all time 0-D(apex) cuboid item time,location time,item location supplier item,location time,supplier 1-D cuboids location,supplier 2-D cuboids item,supplier time,location,supplier 3-D cuboids time,item,location time,item,supplier item,location,supplier 4-D(base) cuboid time, item, location, supplier May 7, 2017 Data Mining: Concepts and Techniques 18 Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures Star schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation May 7, 2017 Data Mining: Concepts and Techniques 19 Example of Star Schema time item time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key branch_key branch location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales item_key item_name brand type supplier_type location location_key street city state_or_province country Measures May 7, 2017 Data Mining: Concepts and Techniques 20 Example of Snowflake Schema time time_key day day_of_the_week month quarter year item Sales Fact Table time_key item_key branch_key branch location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales Measures May 7, 2017 Data Mining: Concepts and Techniques item_key item_name brand type supplier_key supplier supplier_key supplier_type location location_key street city_key city city_key city state_or_province country 21 Example of Fact Constellation time time_key day day_of_the_week month quarter year item Sales Fact Table time_key item_key item_name brand type supplier_type item_key location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales item_key shipper_key location to_location location_key street city province_or_state country dollars_cost Measures May 7, 2017 time_key from_location branch_key branch Shipping Fact Table Data Mining: Concepts and Techniques units_shipped shipper shipper_key shipper_name location_key shipper_type 22 Cube Definition Syntax in DMQL Cube Definition define cube <cube_name> [<dimension_list>]: <measure_list> Dimension Definition (Dimension Table) define dimension <dimension_name> as (<attribute_or_subdimension_list>) May 7, 2017 Data Mining: Concepts and Techniques 23 Defining Star Schema in DMQL define cube sales_star [time, item, branch, location]: dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, province_or_state, country) May 7, 2017 Data Mining: Concepts and Techniques 24 Defining Snowflake Schema in DMQL define cube sales_snowflake [time, item, branch, location]: dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier(supplier_key, supplier_type)) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city(city_key, province_or_state, country)) May 7, 2017 Data Mining: Concepts and Techniques 25 Defining Fact Constellation in DMQL define cube sales [time, item, branch, location]: dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, province_or_state, country) define cube shipping [time, item, shipper, from_location, to_location]: dollar_cost = sum(cost_in_dollars), unit_shipped = count(*) define dimension time as time in cube sales define dimension item as item in cube sales define dimension shipper as (shipper_key, shipper_name, location as location in cube sales, shipper_type) define dimension from_location as location in cube sales define dimension to_location as location in cube sales May 7, 2017 Data Mining: Concepts and Techniques 26 Measures of Data Cube: Three Categories Distributive: if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning Algebraic: if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function E.g., count(), sum(), min(), max() E.g., avg(), min_N(), standard_deviation() Holistic: if there is no constant bound on the storage size needed to describe a subaggregate. May 7, 2017 E.g., median(), mode() Data Mining: Concepts and Techniques 27 DMQL to SQL Select s.time_key , s.item_key , s.branch_key . S.location_key , sum(s.number_of_units_sold * s.price) , sum(s.number_of_units_sold) Rom time t , item I , branch b , location l , Sales s, Where s.time_key = t.time_key and s.item_key = i.item_key and S.brach_key = b.branch_key and s,location_key = l.location_key Group by s.time_key , s.item_key , s.branch_key , s.location_key May 7, 2017 Data Mining: Concepts and Techniques 28 Concept Hierarchies Concept Hierarchies defines a sequence of mappings from a set of low-level concepts to higher level, more general concepts May 7, 2017 Data Mining: Concepts and Techniques 29 A Concept Hierarchy: Dimension (location) all all Europe region country city office May 7, 2017 Germany Frankfurt ... ... ... Spain North_America Canada Vancouver ... L. Chan ... Data Mining: Concepts and Techniques ... Mexico Toronto M. Wind 30 A concept hierarchy that is a total or partial order among attributes in a database schema is called a schema hierarchy. May 7, 2017 Data Mining: Concepts and Techniques 31 Concept hierarchies may also be defined by grouping values for a given dimension or attribute, resulting in a set group hierarchy. {1..10} < inexpensive May 7, 2017 Data Mining: Concepts and Techniques 32 View of Warehouses and Hierarchies May 7, 2017 Data Mining: Concepts and Techniques 33 OLAP operations in the Multidimensional data model. Roll-up: The roll-up operation performs aggregation on a data cube either by climbing up a concept hierarchy for a dimension or by dimension reduction. Drill-down: It is the reverse of roll-up. It navigates from less detailed data to more detailed data. Drill-down can be realized by either stepping down a concept hierarchy for a dimension or introducing additional dimension May 7, 2017 Data Mining: Concepts and Techniques 34 Fig. 3.10 Typical OLAP Operations May 7, 2017 Data Mining: Concepts and Techniques 35 OLAP operations in the Multidimensional data model. Slice and dice: The slice operation performs a selection on one dimension of the given cube, resulting in a sub cube. The dice operation defines a sub cube by performing a selection on two or more dimensions. Pivot(rotate): Pivot is a visualization operation that rotate the data axes in view in order to provide an alternative presentation of the data. May 7, 2017 Data Mining: Concepts and Techniques 36 Other operations May 7, 2017 drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its back-end relational tables (using SQL) Data Mining: Concepts and Techniques 37 A Star-Net Query Model Each circle is called a footprint May 7, 2017 Data Mining: Concepts and Techniques 38 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation From data warehousing to data mining May 7, 2017 Data Mining: Concepts and Techniques 39 Data Warehouse Architecture Four different views regarding the design of a data warehouse must be considered: The top-down view The data source view The data warehouse view The business query view May 7, 2017 Data Mining: Concepts and Techniques 40 Data Warehouse Architecture The top-down view allows the selection of the relevant information necessary for data warehouse. The data source view exposes the information being captured, stored, and managed by operational systems The data warehouse view includes fact tables and dimension tables. It represents the information that is stored inside the data warehouse. May 7, 2017 Data Mining: Concepts and Techniques 41 Data Warehouse Architecture The business query view is the perspective of data warehouse from the view point of the end user. . May 7, 2017 Data Mining: Concepts and Techniques 42 The process of data warehouse design: A data warehouse can be built using a top-down approach, or a combination of both. The top-down approach starts with the overall design and planning. It is useful in cases where the technology is mature and wellknown, and where the business problems that must be solved are clear and well understood. The bottom up approach starts with experiments and prototypes. This is useful in the early stage of business modeling and technology development. It allows an organization to move forward at considerably less expense and to evaluate the benefits of the technology. May 7, 2017 Data Mining: Concepts and Techniques 43 From software engineering point of view May 7, 2017 Waterfall: structured and systematic analysis at each step before proceeding to the next Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around Data Mining: Concepts and Techniques 44 The warehouse design consists of the following steps: Choose a business process to model: If the business process is organizational and involves multiple complex object collections, a data warehouse model should be followed. Choose the grain of the business process: The grain is the fundamental, atomic level data to be represented in the fact table for this process. Choose the dimensions that will apply to each fact table record. Choose the measures that will populate each fact table record. May 7, 2017 Data Mining: Concepts and Techniques 45 Three-Tier data warehouse architecture Other sources Operational DBs Metadata Extract Transform Load Refresh Monitor & Integrator Data Warehouse OLAP Server Serve Analysis Query Reports Data mining Data Marts Data Sources May 7, 2017 Data Storage OLAP Engine Front-End Tools Data Mining: Concepts and Techniques 46 Three-Tier data warehouse architecture Data warehouses often adopt a three-tier architecture. 1. The bottom tier is a warehouse database server that is almost always a relational database system. Back-end tools and utilities are used to feed data into the bottom tier from operational databases or other external sources. These tools and utilities perform data extraction, cleaning, and transformation. 2. The middle tier is an OLAP server that is typically implemented using either a relational OLAP(ROLAP) model, that is an extended relational DBMS. Or a multidimensional OLAP(MOLAP) model, that is , a special purpose server that directly implements multidimensional data and operations. 3. The top tier is a front end client layer, which contains query and reporting tools, analysis tools, data mining tools. Three Data Warehouse Models Enterprise Warehouse: An enterprise warehouse collects all of the information about subjects spanning the entire organization. It typically contains detailed data as well as summarized data. An enterprise data warehouse may be implemented on traditional mainframes, computer supervisors or parallel architecture platforms. Data Mart: A data mart contains a subset of corporate wide data that is of value to a specific group of users. The scope is confined to specific selected subjects. The implementation cycle of a data mart is more likely to be measured in weeks rather than months or years. Dependant on the source of data ,data marts can be categorized as independent or dependant. Virtual Warehouse: A virtual warehouse is a set of views over operational database. It is easy to build but requires excess capacity on operational database servers. May 7, 2017 Data Mining: Concepts and Techniques 49 Data Warehouse Back-end Tools and Utilities Data extraction, which typically gathers data from multiple, heterogeneous, and external sources •Data cleaning, which detects errors in data and rectifies when possible •Data transformation, which converts data from legacy or host format to warehouse format •Load which sorts, summarizes, consolidates, computes views, checks integrity, and builds indices and partitions •Refresh, which propagates the updates from the data sources to warehouse • Metadata Repository Metadata are data about data. When used in data warehouse, metadata are the data that defines the warehouse objects A metadata contains the following: A description of the structure of the data warehouse which includes the warehouse schema, view, dimensions, hierarchies and derived data definitions as well as data mart locations and contents Operational metadata which includes data lineage( history of migrated data an the sequence of transformation applied to it), currency of data and monitoring information The algorithms used for summarizations which include measure and dimension definition algorithm, data on granularity, partition, subject areas, aggregation, summarization, and predefined queries and objects The mapping from the operational environment to the data warehouse, which includes source databases and their contents, gateway descriptions, data partitions, data extraction, cleaning, transformation rules and defaults, data refresh and purging rules and security Data related to system performance which includes indices and profile that improve data access and retrieval performance Business metadata, which include business terms and definitions, data ownership information and charging policies Types of OLAP Servers: ROLAP v/s MOLAP v/s HOLAP OLAP servers present business users with multidimensional data from data warehouses or data marts without concerning regarding how or where the data are stored. Implementation of warehouse server for OLAP processing include the following: Relational OLAP(ROLAP)servers: These are the intermediate servers that stand in between a relational back-end and client front end tools. They use a relational or extended-RDBMS to store and manage warehouse data Multidimensional OLAP(MOLAP)servers: These servers support multidimensional view of data through array based multidimensional storage engines. They map multidimensional views directly to data cube array structure. Hybrid OLAP(HOLAP)servers: The hybrid OLAP combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. Specialized SQL Servers: To meet the growing demand of OLAP processing in relational database system vendors implement specialized SQL servers that provide advanced query language and query process support for SQL queries over star and snowflake schemas in read-only environment. May 7, 2017 Data Mining: Concepts and Techniques 55 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation From data warehousing to data mining May 7, 2017 Data Mining: Concepts and Techniques 56 Data warehouse implementation Data warehouses contain huge volumes of data. OLAP servers demand that decision support queries be answered in the order of second. In SQL terms, these aggregations are referred to as group-by’s . Each group-by can be represented by a cuboid,where the set of groupby’s forms a lattice of cuboids defining a data cube. May 7, 2017 Data Mining: Concepts and Techniques 58 Cube Operation Cube definition and computation in DMQL define cube sales[item, city, year]: sum(sales_in_dollars) compute cube sales Transform it into a SQL-like language (with a new operator cube by, introduced by Gray et al.’96) SELECT item, city, year, SUM (amount) FROM SALES CUBE BY item, city, year Need compute the following Group-Bys (date, product, customer), (date,product),(date, customer), (product, customer), (date), (product), (customer) () May 7, 2017 Data Mining: Concepts and Techniques 59 Lattice of cuboids making a 3-D data cube () (city) (city, item) (item) (city, year) (year) (item, year) (city, item, year) May 7, 2017 Data Mining: Concepts and Techniques 60 Data Cube = 2n How many cuboids in an n-dimensional cube with L levels? n T ( Li 1) i 1 May 7, 2017 Data Mining: Concepts and Techniques 61 Partial materialization:Selected Computation of Cuboids There are three choice for data cube materialization given a base cuboid No materialization Do not precomputed any of the “nonbase” cuboids. -computing expensive multidimensional aggregates on the fly . Extremely slow full materialization precompute all of the cuboid. The resulting lattice of computed cuboids is referred to as the full cube. - huge amount of memory Partial materialization selectively compute a proper subset of the whole set of possible cuboid. Iceberg Cube Computing only the cuboid cells whose count or other aggregates satisfying the condition like HAVING COUNT(*) >= minsup Motivation Only a small portion of cube cells may be “above the water’’ in a sparse cube Only calculate “interesting” cells—data above certain threshold Avoid explosive growth of the cube May 7, 2017 Data Mining: Concepts and Techniques 63 Indexing OLAP Data: Bitmap Index Index on a particular column Each value in the column has a bit vector: bit-op is fast The length of the bit vector: # of records in the base table The i-th bit is set if the i-th row of the base table has the value for the indexed column not suitable for high cardinality domains Base table Cust C1 C2 C3 C4 C5 Region Asia Europe Asia America Europe May 7, 2017 Index on Region Index on Type Type RecIDAsia Europe America RecID Retail Dealer Retail 1 1 0 1 1 0 0 Dealer 2 2 0 1 0 1 0 Dealer 3 1 0 0 3 0 1 4 0 0 1 4 1 0 Retail 0 1 0 5 0 1 Dealer 5 Data Mining: Concepts and Techniques 64 Indexing OLAP Data: Bitmap Index May 7, 2017 Data Mining: Concepts and Techniques 65 Indexing OLAP Data: Join Indices Join index: JI(R-id, S-id) where R (R-id, …) S (S-id, …) Traditional indices map the values to a list of record ids It materializes relational join in JI file and speeds up relational join In data warehouses, join index relates the values of the dimensions of a start schema to rows in the fact table. E.g. fact table: Sales and two dimensions city and product A join index on city maintains for each distinct city a list of R-IDs of the tuples recording the Sales in the city Join indices can span multiple dimensions May 7, 2017 Data Mining: Concepts and Techniques 66 Indexing OLAP Data: Join Indices May 7, 2017 Data Mining: Concepts and Techniques 67 May 7, 2017 Data Mining: Concepts and Techniques 68 May 7, 2017 Data Mining: Concepts and Techniques 69 Efficient Processing OLAP Queries Determine which operations should be performed on the available cuboids Transform drill, roll, etc. into corresponding SQL and/or OLAP operations, e.g., dice = selection + projection Determine which materialized cuboid(s) should be selected for OLAP op. Let the query to be processed be on {brand, province_or_state} with the condition “year = 2004”, and there are 4 materialized cuboids available: 1) {year, item_name, city} 2) {year, brand, country} 3) {year, brand, province_or_state} 4) {item_name, province_or_state} where year = 2004 Which should be selected to process the query? Explore indexing structures and compressed vs. dense array structs in MOLAP May 7, 2017 Data Mining: Concepts and Techniques 70 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation From data warehousing to data mining May 7, 2017 Data Mining: Concepts and Techniques 71 Data Warehouse Usage Three kinds of data warehouse applications Information processing Analytical processing multidimensional analysis of data warehouse data supports basic OLAP operations, slice-dice, drilling, pivoting Data mining May 7, 2017 supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs knowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools Data Mining: Concepts and Techniques 72 From On-Line Analytical Processing to OnLine Analytical Mining On-line analytical mining(OLAM) integrates on-line analytical processing(OLAP) with data mining and mining knowledge in multidimensional databases. OLAM is particularly important for the following reasons. High quality of data in data warehouses: Most data mining tools need to work on integrated, consistent, and cleaned data, which requires costly data cleaning, data integration, and data transformation as preprocessing steps. Available information processing infrastructure surrounding data warehouses: Comprehensive information processing and data analysis infrastructures have been or will be systematically constructed surrounding data warehouses. ODBC, OLEDB, Web accessing, service facilities, reporting and OLAP tools OLAP-based exploratory data analysis: Effective data mining needs exploratory data analysis. Mining with drilling, dicing, pivoting, etc. On-line selection of data mining functions: Integration and swapping of multiple mining functions, algorithms, and tasks An OLAM System Architecture Mining query Mining result Layer4 User Interface User GUI API OLAM Engine OLAP Engine Layer3 OLAP/OLAM Data Cube API Layer2 MDDB MDDB Meta Data Filtering&Integration Database API Filtering Layer1 Data cleaning Databases May 7, 2017 Data Data integration Warehouse Data Mining: Concepts and Techniques Data Repository 75 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation From data warehousing to data mining Summary May 7, 2017 Data Mining: Concepts and Techniques 76 Summary May 7, 2017 Data Mining: Concepts and Techniques 77 Summary May 7, 2017 Data Mining: Concepts and Techniques 78 Summary May 7, 2017 Data Mining: Concepts and Techniques 79 Summary May 7, 2017 Data Mining: Concepts and Techniques 80 Summary May 7, 2017 Data Mining: Concepts and Techniques 81