Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Grade Examined by ………………………………… Module KS091323 BUSINESS INTELLIGENCE 02 Practical Work Report Prepared By 01. Rama Catur APP (5207100077) 02. Goeij Yong Sun (5207100098) Information System Department Faculty of Information Technology INSTITUT TEKNOLOGI SEPULUH NOPEMBER 2009 03. Arief Rakhman (5207100092) Blog URL theagroupofbi.wordpress.com Submission date 22-10-2009 ABSTRACT The practical work objective is to understand OLAP and how we use it on SQL Server 2008. This is done by observing the presentation given by practical work assistants and try to explore OLAP in SQL Server 2008 or formally known as SQL Server Analysis Services (SSAS) in its documentation and also have limited covers on SQL Server Integration Services (SSIS). The result of this practical work is the students know better about the practice of using the software for business intelligence related analysis. 1 TABLE OF CONTENT Abstract ........................................................................................................................................................ 1 Table of Content ........................................................................................................................................... 2 Introduction ................................................................................................................................................. 3 Literature Review ......................................................................................................................................... 3 Defining Data Sources (Analysis Services) ................................................................................................ 4 Defining a Data Source Using the Data Source Wizard (Analysis Services) ......................................... 4 Designing Dimensions .............................................................................................................................. 5 Designing Cube ......................................................................................................................................... 6 Understanding the Database Schemas ................................................................................................. 6 Introducing the Schema Generation Wizard ......................................................................................... 9 Defining a Fact Relationship and Fact Relationship Properties ........................................................... 10 Using the Data Mining Tools .................................................................................................................. 10 Data Mining Wizard (Analysis Services – Data Mining) ....................................................................... 11 Basic Data Mining Tutorial (Result Section) ........................................................................................ 17 Methodology............................................................................................................................................... 18 Practical Work Scenarios............................................................................................................................. 18 Result .......................................................................................................................................................... 19 Discussion ................................................................................................................................................... 39 Conclusion .................................................................................................................................................. 48 References .................................................................................................................................................. 49 2 Introduction The background of the practical work is the theories we’ve got from class about OLAP. The objectives is to understand the using of OLAP in SQL Server 2008 related to business intelligence context. Literature Review For supporting this assignment, we’ve search the rich content documentation from the official websites of Microsoft SQL Server 2008 [1]. They provide us with the offline version of Microsoft SQL Server Books Online. It can be downloaded from this link. For this assignment, we selected these topics: Defining Data Sources (Analysis Services) Defining a Data Source Using the Data Source Wizard (Analysis Services) Tutorial: Creating a Simple ETL Package (Result Section) Designing Dimensions Designing Cube Understanding the Database Schemas Introducing the Schema Generation Wizard Defining a Fact Relationship and Fact Relationship Properties Using the Data Mining Tools Data Mining Wizard (Analysis Services – Data Mining) Basic Data Mining Tutorial (Result Section) Defining Data Sources (Analysis Services) A Microsoft SQL Server Analysis Services data source is an object that provides the Analysis Services service with the information needed for it to connect to a source of information for the business intelligence solution. Analysis Services can access data from one or more sources of data, provided that Analysis Services is able to construct the OLAP or data mining queries required by the business intelligence solution. 3 Defining a Data Source Using the Data Source Wizard (Analysis Services) We use the Data Source Wizard in Business Intelligence Development Studio to define one or more data sources for a Microsoft SQL Server Analysis Services project. Whether we are working with an Analysis Services project or connected directly to an Analysis Services database, we can define a data source based on a new or an existing connection. Creating a Data Source Based on a New Connection The default provider for a new connection is the Native OLE DB\SQL Server Native Client provider. SQL Server Analysis Services supports many different types of providers. For a list of the providers and relational databases supported by SQL Server Analysis Services, see the documentation. After we select a provider, we provide specific connection information required by that provider to connect to the underlying data. The exact information required depends upon the provider selected, but generally such information includes a server or service instance, information for logging on to the server or service instance, a database or file name, and other provider-specific settings. Creating a Data Source Based on an Existing Connection If we have an existing data source defined in an Analysis Services database or project and wish to create a new data source object that connects to the same underlying data source, we can simply copy properties of the first data source object into a new data source object. We can then specify its own impersonation settings and then, after creating the new data source, modify the data source to change one or more of its properties. Tutorial: Creating a Simple ETL Package In this tutorial, we will learn how to use SSIS (SQLServer Integration Services) Designer to create a simple Microsoft SQL Server Integration Services package. The package that we create takes data from a flat file, reformats the data, and then inserts the reformatted data into a fact table. In following lessons, the package will be expanded to demonstrate looping, package configurations, logging and error flow. 4 What We Will Learn The best way to become acquainted with the new tools, controls and features available in Microsoft SQL Server Integration Services is to use them. This tutorial walks we through SSIS Designer to create a simple ETL package that includes looping, configurations, error flow logic and logging. Lessons in This Tutorial Lesson 1: Creating the Project and Basic Package Lesson 2: Adding Looping Lesson 3: Adding Package Configurations Lesson 4: Adding Logging Lesson 5: Adding Error Flow Redirection The lessons will be explored later in the Result section of this assignment. Designing Dimensions A database dimension is a collection of related objects, called attributes, which can be used to provide information about fact data in one or more cubes. For example, typical attributes in a product dimension might be product name, product category, product line, product size, and product price. These objects are bound to one or more columns in one or more tables in a data source view. By default, these attributes are visible as attribute hierarchies and can be used to understand the fact data in a cube. Attributes can be organized into userdefined hierarchies that provide navigational paths to assist users when browsing the data in a cube. Cubes contain all the dimensions on which users base their analyses of fact data. An instance of a database dimension in a cube is called a cube dimension and relates to one or more measure groups in the cube. A database dimension can be used multiple times in a cube. For example, a fact table can have multiple time-related facts, and a separate cube dimension can be defined to assist in analyzing each time-related fact. However, only one time-related database dimension needs to exist, which also means that only one time- 5 related relational database table needs to exist to support multiple cube dimensions based on time. Defining Dimensions, Attributes, and Hierarchies The simplest method for defining database and cube dimensions, attributes, and hierarchies is to use the Cube Wizard to create dimensions at the same time that we define the cube. The Cube Wizard will create dimensions based on the dimension tables in the data source view that the wizard identifies or that we specify for use in the cube. The wizard then creates the database dimensions and adds them to the new cube, creating cube dimensions. When we create a cube, we can also add to the new cube any dimensions that already exist in the database. These dimensions may have been previously defined for another cube or by the Dimension Wizard. After a database dimension has been defined, we can modify and configure the database dimension in Dimension Designer. We can also customize the cube dimension, to a limited extent, in Cube Designer. Designing Cubes A cube is a multidimensional structure that contains dimensions and measures. Dimensions define the structure of the cube, and measures provide the numerical values of interest to the end user. As a logical structure, a cube allows a client application to retrieve values as if cells in the cube defined every possible summarized value. Cell positions in the cube are defined by the intersection of dimension members. Dimension hierarchies provide aggregation paths within a cube. Measure values are aggregated at non-leaf levels to provide member values in the dimension hierarchies. Understanding the Database Schemas The Schema Generation Wizard generates a denormalized relational schema for the subject area database based on the dimensions and measure groups in Analysis Services. The wizard generates a relational table for each dimension to store dimension data, which is called a dimension table, and a relational table for each measure group to store fact data, which is called a fact table. The 6 wizard ignores linked dimensions, linked measure groups, and server time dimensions when it generates these relational tables. Validation Before it begins to generate the underlying relational schema, the Schema Generation Wizard validates the Analysis Services cubes and dimensions. If the wizard detects errors, it stops and reports the errors to the Task List window in Business Intelligence Development Studio. Examples of errors that prevent generation include the following: – – – – Dimensions that have more than one key attribute. Parent attributes that have different data types than the key attributes. Measure groups that do not have measures. Degenerate dimensions or measures that are improperly configured. Dimension Tables For each dimension, the Schema Generation Wizard generates a dimension table to be included in the subject area database. The structure of the dimension table depends on the choices made while designing the dimension on which it is based. Columns The wizard generates one column for the bindings associated to each attribute in the dimension on which the dimension table is based, such as the bindings for the KeyColumns, NameColumn, ValueColumn, CustomRollupColumn, CustomRollupPropertiesColumn, and UnaryOperatorColumn properties of each attribute. Relationships The wizard generates a relationship between the column for each parent attribute and the primary key of the dimension table. The wizard also generates a relationship to the primary key in each additional dimension table defined as a referenced dimension in the cube, if applicable. Constraints The wizard generates a primary key constraint, by default, for each dimension table based on the key attribute of the dimension. If the primary key constraint is generated, a separate name column is 7 generated by default. A logical primary key is created in the data source view even if we decide not to create the primary key in the database. Translations The wizard generates a separate table to hold the translated values for any attribute that requires a translation column. The wizard also creates a separate column for each of the required languages. Fact Tables For each measure group in a cube, the Schema Generation Wizard generates a fact table to be included in the subject area database. The structure of the fact table depends on the choices made while designing the measure group on which it is based, and the relationships established between the measure group and any included dimensions. Columns The wizard generates one column for each measure, except for measures that use the Count aggregation function. Such measures do not require a corresponding column in the fact table. The wizard also generates one column for each granularity attribute column of each regular dimension relationship on the measure group, and one or more columns for the bindings associated to each attribute of a dimension that has a fact dimension relationship to the measure group on which this table is based, if applicable. Relationships The wizard generates one relationship for each regular dimension relationship from the fact table to the dimension table's granularity attribute. If the granularity is based on the key attribute of the dimension table, the relationship is created in the database and in the data source view. If the granularity is based on another attribute, the relationship is created only in the data source view. If we chose to generate indexes in the wizard, a non-clustered index is generated for each of these relationship columns. Constraints Primary keys are not generated on fact tables. If we chose to enforce referential integrity, referential integrity constraints are generated between dimension tables and fact tables where applicable. 8 Translations The wizard generates a separate table to hold the translated values for any property in the measure group that requires a translation column. The wizard also creates a separate column for each of the required languages. Microsoft Integration Services is a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing. Integration Services includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending e-mail messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, the Integration Services service for administering package execution and storage; and application programming interfaces (APIs) for programming the Integration Services object model. Introducing the Schema Generation Wizard When we design our dimensions and cubes by using the top-down method in Business Intelligence Development Studio, we create dimension and cube definitions in a Microsoft SQL Server Analysis Services project and then use the Schema Generation Wizard to generate a data source view, a data source, and the underlying relational database schema that supports these OLAP objects. This relational database is referred to as the subject area database. After the Schema Generation Wizard has generated the underlying objects based on the design of our dimensions and cubes in an Analysis Services instance or in an Analysis Services project, we can change the design of the dimensions and cubes, and then rerun the Schema Generation Wizard to regenerate the underlying objects based on the modified design. When the underlying objects are regenerated, the Schema Generation Wizard incorporates the changes into the underlying objects and, as much as is possible, preserves the data contained in the underlying databases. 9 Defining a Fact Relationship and Fact Relationship Properties When we define a new cube dimension or a new measure group, Analysis Services will try to detect if a fact dimension relationship exists and then set the dimension usage setting to Fact. We can view or edit a fact dimension relationship on the Dimension Usage tab of Cube Designer. The fact relationship between a dimension and a measure group has the following constraints: A cube dimension can have only one fact relationship to a particular measure group. A cube dimension can have separate fact relationships to multiple measure groups. The granularity attribute for the relationship must be the key attribute (such as Transaction Number) for the dimension. This creates a one-toone relationship between the dimension and facts in the fact table. Using the Data Mining Tools Microsoft SQL Server Analysis Services provides tools that we can use to create data mining solutions to address specific business problems. In Business Intelligence Development Studio, the Data Mining Wizard makes it easy to create mining structures and mining models that are based on OLAP and relational data sources. We can use the wizard to define structures and models that use specific data mining techniques to analyze our data. We can also use Data Mining Designer to define our mining models even more, and to explore and work with the results of the models. SQL Server Management Studio provides tools that we can use to manage and explore our mining models after they are created. SQL Server Integration Services contains tools that we can use to clean data, to automate tasks such as creating predictions and updating models, and to create text mining solutions. Data Mining Wizard The Data Mining Wizard is the entry point within Business Intelligence Development Studio for creating data mining solutions. The wizard is designed to guide we through the process of creating a data mining structure and an 10 initial related mining model, and includes the tasks of selecting an algorithm type and a data source, and defining a case table. Data Mining Designer After we use the Data Mining Wizard to create a mining structure and an initial mining model, the Data Mining Designer opens. In the designer, we can manage our mining structures, create new mining models, and deploy, browse, compare, and create predictions against existing mining models. SQL Server Management Studio After we create and deploy mining models to a server, we can use SQL Server Management Studio to perform management and exploratory tasks, such as viewing and processing the models, and creating predictions against them. Management Studio also contains a query editor that we can use to design and execute Data Mining Extensions (DMX) queries. Integration Services Data Mining Tasks and Transformations SQL Server Integration Services provides tools that we can use to automate common data mining tasks, such as processing a mining model and creating prediction queries. For example, if we have a mining model that is built from a dataset of potential customers, we could create an Integration Services package that automatically updates the model every time the dataset is updated with new customers. We could then use the package to create a prediction, by separating the potential customers into two tables. One table could contain likely customers and the other table customers who are not likely to purchase any products. Data Mining Wizard (Analysis Services - Data Mining) The Data Mining Wizard in Microsoft SQL Server Analysis Services starts every time that we add a new mining structure to a data mining project. The wizard helps us define new mining structures, and chooses the data sources that we will use for data mining. The wizard also can partition the data in the mining structure into training and testing sets, and help us add an initial mining model for each structure. 11 The content of a mining structure is derived from an existing data source view or cube. We can choose which columns to include in the mining structure. All models that are based on that structure can use those columns. We can enable users of a data mining model to drill down from the results of the mining model to see additional mining structure columns that oure not included in the mining model itself. – – – – – We must make the following decisions when we create a data mining structure and model by using the Data Mining Wizard: Whether to build the data mining structure and models from a relational database or from an existing cube in an OLAP database. How much data to use in training, and how much to set aside for testing. When we partition a mining structure into training and testing data sets, all models that are based on that structure can use that testing set. Which columns or attributes to use for prediction, and which columns or attributes to use as input for analysis. Each structure must also contain a key that uniquely identifies a case record. Which algorithm to use. The algorithms provided in SQL Server Analysis Services have different characteristics and produce different results. We can create multiple models using different algorithms, or change parameters for the algorithms to create different models. . The Data Mining Wizard provides functionality to help us make these decisions: – – – – Wizard pages in which we define the case set. We can choose case tables and nested tables from a relational data source, or choose an OLAP data source and then select the case key and case level columns and then optionally set filters on the cube. Dialog boxes that analyze the data in columns and recommend usage for the columns. Auto-detection of column content and data types. Automatic slicing of the cube, if our mining model is based on an OLAP data source. After we complete the Data Mining Wizard, we use Data Mining Designer to modify the mining structure and models, to view the accuracy of the model, view characteristics of the structure and models, or make predictions by using the models. 12 Using the Data Mining Wizard To start the Data Mining Wizard, add a new mining structure to an Analysis Services project by using Solution Explorer or the Project menu in Business Intelligence Development Studio. The Data Mining Wizard has two branches, depending on whether our data source is relational or in a cube: – – Relational Mining Models OLAP Mining Models Relational Mining Models When we build a mining model from a relational data source in Analysis Services, we first specify in the Data Mining Wizard that we want to use an existing relational database to define the structure of the model. We also have the option of creating just the mining structure, or creating the mining structure and one associated data mining model. If we choose to create a mining model, we must specify the data mining technique to use, by selecting the algorithm that is most appropriate for the type of data mining analysis that we want. Specifying the Data Source View and Table Types The next steps in the wizard are to select the specific data source view that we want to use to define the mining structure, and to specify a case table. The case table will be used for training the data mining model, and optionally for testing it as well. We can also specify a nested table. Selecting the case table is an important decision. The case table should contain the entities that we want to analyze: for example, customers and their demographic information. The nested table usually contains additional information about the entities in the case table, such as transactions conducted by the customer, or attributes that have a many-to-one relationship with the entity. For example, a nested tables joined to the Customers case table might include a list of products purchased by each customers, or a list of hobbies. For More Information: Nested Tables (Analysis Services - Data Mining) Specifying the Column Usage After we specify the case table and the nested tables, we determine the usage type for each column in the tables that we will include in the mining structure. 13 If we do not specify a usage type for a column, the column will not be included in the mining structure. Data mining columns can be one of four types: key, input, predictable, or a combination of input and predictable. Key columns contain a unique identifier for each row in a table. Some mining models, such as those based on the sequence clustering or time series algorithms, can contain multiple key columns. However, these multiple keys are not compound keys in the relational sense, but instead must be selected so as to provide support for time series and sequence clustering analysis. For more information, see Microsoft Time Series Algorithm or Microsoft Sequence Clustering Algorithm. Input columns provide the information from which predictions are made. Predictable columns contain the information that we try to predict in the mining model. For example, a series of tables may contain customer IDs, demographic information, and the amount of money each customer spends at a specific store. The customer ID uniquely identifies the customer and also relates the case table to the nested tables; therefore, we would use the customer ID as the key column. We could use a selection of columns from the demographic information as input columns, and the column that describes the amount of money each customer spends as a predictable column. We could then build a mining model that relates demographics to how much money a customer spends in a store. We could use this model as the basis for targeted marketing. The Data Mining Wizard provides the Suggest feature, which is enabled when we select a predictable column. Datasets frequently contain more columns than we need to build a mining model. The Suggest feature calculates a numeric score, from 0 to 1, that describes the relationship between each column in the dataset and the predictable column. Based on this score, the feature suggests columns to use as input for the mining model. If we use the Suggest feature, we can use the suggested columns, modify the selections to fit our needs, or ignore the suggestions. Specifying the Content and Data Types After we select one or more predictable columns and input columns, we can specify the content and data types for each column. 14 Split Data into Training and Testing Sets The final step before we complete the wizard is to partition our data into training and testing sets. The ability to hold out a portion of the data for testing is new in SQL Server 2008 and provides an easy-to-use mechanism for ensuring that a consistent set of test data is available for use with all mining models associated with the new mining structure. We can specify that a certain percentage of the data be used for testing, and all remaining will be used for training. We can also specify the number of cases to use for testing. The definition of the partition is stored with the mining structure, so that whenever we create a new model based on the structure, the testing data set will be available for assessing the accuracy of the model. For More Information: Validating Data Mining Models (Analysis Services - Data Mining), Partitioning Data into Training and Testing Sets (Analysis Services Data Mining) Completing the Wizard The last step in the wizard is to name the mining structure and the associated mining model. If we select Allow drill through, the drill through functionality is enabled in the model. This lets users who have the appropriate permissions explore the source data that is used to build the model. OLAP Mining Models When we build a multidimensional mining model from an OLAP data source in Analysis Services, we first specify in the Data Mining Wizard that we want to use an existing cube to define the structure of the model. We have the option of creating just the mining structure, or creating the mining structure plus one associated data mining model. If we choose to create a mining model, we must specify the data mining technique to use, by selecting the algorithm that is most appropriate for our business problem. Specifying the Data Source and Case Key Next, we select the cube dimension to use as the data source to define the mining structure. Then we select an attribute to use as the key, or case key, of the mining model. 15 Specifying Case Level Columns and Column Usage After we select a case key, the attributes and measures that are associated with that key are displayed in a tree view on the next page of the wizard. From this list, we select the attributes and measures to be used as the columns of the structure. These columns are known as case level columns. As with a relational model, we must also specify how each column should be used in the structure, which we can do on the next page of the wizard. Columns can be key, input, predictable, input and predictable, or unselected. Adding Nested Tables The OLAP branch of the Data Mining Wizard includes the option to add nested tables to the mining model structure. On the Specify Mining Model Column Usage page of the wizard, click Add Nested Tables to open a separate dialog box that guides we through the steps to add nested tables. Only the measure groups that apply to the dimension are displayed. Select a measure group that contains the foreign key of the case dimension. Next, specify the usage for each column in the measure group, either input or predictable. The wizard then adds the nested table to the case table. The default name for the nested table is the nested dimension name, but we can rename the nested table and its columns. For More Information: Nested Tables (Analysis Services - Data Mining) Specifying the Content and Data Types After we select one or more predictable columns and input columns, we can specify the content and data types for each column. Slicing the Source Cube In the OLAP branch of the wizard, we can limit the scope of our mining model by slicing the source cube before we train the mining model. Slicing the cube is similar to adding a WHERE clause to an SQL statement. For example, if a cube contains information about the purchase of products, we might limit an age attribute to more than 30, a gender column to only female, and a purchase date to no earlier than March 2000. In such a way we can limit the model to cover the scope of a female who is older than 30 years and who bought a product after March 2000. 16 Split Data into Training and Testing Sets The final step before we complete the wizard is to partition the data that is available from the cube into training and testing sets. The definition of the partition is stored with the mining structure, so that whenever we create a new model based on the structure, the testing data set will be available for assessing the accuracy of the model. Completing the Wizard The last step in the wizard is to name the mining structure and the associated mining model. If we select Allow drill through, the drill through functionality is enabled in the model. This lets users who have the appropriate permissions explore the source data that is used to build the model. We can also specify whether we want to add a new dimension to the source cube that is based on the mining model, or create a new cube from the mining model. For more concrete practice, we’ll use these theories on Basic Data Mining Tutorial in Result section of this assignment. 17 Methodology We took case study from the tutorial that we’ve got from Microsoft SQL Server 2008 Books Online: Basic Data Mining Tutorial. In this Basic Data Mining Tutorial, we will complete a scenario for a targeted mailing campaign in which we create three models for analyzing customer purchasing behavior and targeting potential buyers. The tutorial demonstrates how to use the data mining algorithms, mining model vieours, and data mining tools that are included in Microsoft SQL Server Analysis Services. The fictitious company, Adventure Works Cycles, is used for all examples. We also add some steps for creating a cube based on dimensions that was determine before. Here it is. Tutorial Scenario In this tutorial, we are an employee of Adventure Works Cycles who has been tasked with learning more about the company's customers based on historical purchases, and then using that historical data to make predictions that can be used in marketing. The company has never done data mining before, so we must create a new database specifically for data mining and set up several data mining models. This tutorial is divided into the following lessons: Lesson 1: Preparing the Analysis Services Database Lesson 2: Building a Targeted Mailing Structure Lesson 3: Adding and Processing Models Lesson 4: Exploring the Targeted Mailing Models Practical Work Scenarios The practical work is done on Friday 16th, 2009, from 10.00 to 11.00. It was over earlier than it was planned. We weren’t use any of software and hardware this time. We only listening to the presentation given by assistants. But after that, we get assignment to explore the software and documented it in this report. 18 Result Lesson 1: Preparing the Analysis Services Database (Basic Data Mining Tutorial) In this lesson, we will learn how to create a new Analysis Services database, add a data source and data source view, and prepare the new database to be used with data mining. Creating new project 19 Choose Analysis Service Project Right click Data Source, choose New Data Source... 20 Click Next Select the methods of connection 21 Create new connection, here we use localhost as server. Click OK. Select Use the Service Account. Click Next. 22 Completing the Wizard page let us review the setting. Notes that a new data source appear in Data Sources folder. 23 To create Data Source View, right click the folder, then choose New Data Source View Click Next. 24 Select the existing Data Source that we’ve created before. Select Tables and Views we want to include in Data Source View. Click Next. 25 Name the new Data Source Views ‘Targeted Mailing’, then click Finish. After that we got the Data Source View ready. 26 Lesson 2: Building a Targeted Mailing Structure (Basic Data Mining Tutorial) In this lesson, we will learn how to create a mining model structure that can be used as part of a targeted mailing scenario. First, right click the Mining Structures folder, choose New Mining Structure... 27 Click Next. Select the Definition Method. Here we choose the definition from existing relational database. (It was different from the assignment, it should be from cube. So we’ll make the cube after this lessons) 28 Specify the Data Mining Structure. Here we choose Microsoft Desicion Tree as algorithm. Then, we select Data Source View that we’ve created: Targeted Mailing. 29 Then we specify table types, we choose the v Target Mail table/view as case. Then we specify the Training Data. We got the samples for selecting the tables from the tutorial. 30 We can also get suggestion from the software for choosing the right tables. Specify the data that will be the input, predicts, and additional table that wuoldn’t connect each other but we want to include it. 31 Specify the column content and data type. Then, we create testing set. 32 In Completing the Wizard page, we can review the settings. The result is like picture above. 33 Lesson 3: Adding and Processing Models In this lesson we will learn how to add models to a structure. The models we create are built with the following algorithms: a. Microsoft Decision Trees b. Microsoft Clustering c. Microsoft Naive Bayes Click the Mining Models tab. 34 Right click the column, the select New Mining Model... Input the name of the model. Here we enter TM_Clustering for this model is will be using Microsoft Clustering Algorithm. 35 After we Click OK, we can see the result like picture above. We can also add Microsoft Naive Bayes Algorithm-based model. 36 We’re done for the model. For executing, click Mining Model -> Process... 37 Click Run. The above picture is indicating the process progress. 38 Lesson 4: Exploring the Targeted Mailing Models (Basic Data Mining Tutorial) In this lesson we will learn how to explore and interpret the findings of each model using the Views. For exploring the views, we select the Mining Model Viewer. 39 We can also choose the views for Clustering model. We can also choose the views for Naive Bayes model. 40 To see the Accuracy Chart, click the Mining Accuracy Chart tab. Specify the predict value to 1 and choose Use mining structure test cases. 41 We can see the Lift Chart showing the most appropriate algorithm we should use. In this camparison, we see that Desicion Tree is the most appropriate. Creating a cube To have an advanced mutidimensional analysis, we’ll create a cube. Right click the Cube folder, choose New Cube... 42 Click Next. Select the creation method. Here we choose to use existing tables 43 From the Data Source View ‘Targeted Mailing’, we indentify its two tables, then we select them. Then we select the Measures. Here we select all of the available options. 44 Then we select the existing dimensions. Select the new dimensions. 45 In the Completing the Wizard page, we name the cube and reviews the options we make. 46 The two above picture show the Cube result. 47 Discussion In this practical work we find much problem in connecting the SQL Server with the database as the data sources we use. Maybe it caused by wrong installation steps in last practical work. We actually done this practical work by following the tutorial provided by Microsoft SQL Server 2008. So, the base theories wouldn’t be too clear for us. May the next practical work be hellpful for us to understand more. Conclusion At last we know the big picture for using Analysis Service provided by Microsoft SQL Server 2008. These points are: Defining Data Sources (Analysis Services) Defining a Data Source Using the Data Source Wizard (Analysis Services) Designing Dimensions Designing Cube Understanding the Database Schemas Introducing the Schema Generation Wizard Defining a Fact Relationship and Fact Relationship Properties Using the Data Mining Tools Data Mining Wizard (Analysis Services – Data Mining) Basic Data Mining Tutorial (Result Section) 48 References [1] Microsoft. (2009). Download details : Microsoft SQL Server 2008 Books Online (July 2009). Accesed 2009, from www.microsoft.com: http://www.microsoft.com/downloads/thankwe.aspx?familyId=765433f7-0983-4d7a-b6280a98145bcb97&displayLang=en 49