Download Practical Work Report - The A Group of BI`s Blog

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Grade
Examined by …………………………………
Module
KS091323
BUSINESS INTELLIGENCE
02
Practical Work Report
Prepared By
01. Rama Catur APP (5207100077)
02. Goeij Yong Sun (5207100098)
Information System Department
Faculty of Information Technology
INSTITUT TEKNOLOGI SEPULUH NOPEMBER
2009
03. Arief Rakhman (5207100092)
Blog URL
theagroupofbi.wordpress.com
Submission date
22-10-2009
ABSTRACT
The practical work objective is to understand OLAP and how we use it on SQL
Server 2008. This is done by observing the presentation given by practical work
assistants and try to explore OLAP in SQL Server 2008 or formally known as
SQL Server Analysis Services (SSAS) in its documentation and also have
limited covers on SQL Server Integration Services (SSIS). The result of this
practical work is the students know better about the practice of using the
software for business intelligence related analysis.
1
TABLE OF CONTENT
Abstract ........................................................................................................................................................ 1
Table of Content ........................................................................................................................................... 2
Introduction ................................................................................................................................................. 3
Literature Review ......................................................................................................................................... 3
Defining Data Sources (Analysis Services) ................................................................................................ 4
Defining a Data Source Using the Data Source Wizard (Analysis Services) ......................................... 4
Designing Dimensions .............................................................................................................................. 5
Designing Cube ......................................................................................................................................... 6
Understanding the Database Schemas ................................................................................................. 6
Introducing the Schema Generation Wizard ......................................................................................... 9
Defining a Fact Relationship and Fact Relationship Properties ........................................................... 10
Using the Data Mining Tools .................................................................................................................. 10
Data Mining Wizard (Analysis Services – Data Mining) ....................................................................... 11
Basic Data Mining Tutorial (Result Section) ........................................................................................ 17
Methodology............................................................................................................................................... 18
Practical Work Scenarios............................................................................................................................. 18
Result .......................................................................................................................................................... 19
Discussion ................................................................................................................................................... 39
Conclusion .................................................................................................................................................. 48
References .................................................................................................................................................. 49
2
Introduction
The background of the practical work is the theories we’ve got from class about
OLAP. The objectives is to understand the using of OLAP in SQL Server 2008
related to business intelligence context.
Literature Review
For supporting this assignment, we’ve search the rich content documentation
from the official websites of Microsoft SQL Server 2008 [1]. They provide us with
the offline version of Microsoft SQL Server Books Online. It can be downloaded
from this link. For this assignment, we selected these topics:
 Defining Data Sources (Analysis Services)
 Defining a Data Source Using the Data Source Wizard (Analysis
Services)
 Tutorial: Creating a Simple ETL Package (Result Section)
 Designing Dimensions
 Designing Cube
 Understanding the Database Schemas
 Introducing the Schema Generation Wizard
 Defining a Fact Relationship and Fact Relationship Properties
 Using the Data Mining Tools
 Data Mining Wizard (Analysis Services – Data Mining)
 Basic Data Mining Tutorial (Result Section)
Defining Data Sources (Analysis Services)
A Microsoft SQL Server Analysis Services data source is an object that provides
the Analysis Services service with the information needed for it to connect to a
source of information for the business intelligence solution. Analysis Services
can access data from one or more sources of data, provided that Analysis
Services is able to construct the OLAP or data mining queries required by the
business intelligence solution.
3
Defining a Data Source Using the Data Source Wizard (Analysis Services)
We use the Data Source Wizard in Business Intelligence Development Studio to
define one or more data sources for a Microsoft SQL Server Analysis Services
project.
Whether we are working with an Analysis Services project or connected directly
to an Analysis Services database, we can define a data source based on a new
or an existing connection.
Creating a Data Source Based on a New Connection
The default provider for a new connection is the Native OLE DB\SQL Server
Native Client provider. SQL Server Analysis Services supports many different
types of providers. For a list of the providers and relational databases
supported by SQL Server Analysis Services, see the documentation.
After we select a provider, we provide specific connection information required
by that provider to connect to the underlying data. The exact information
required depends upon the provider selected, but generally such information
includes a server or service instance, information for logging on to the server or
service instance, a database or file name, and other provider-specific settings.
Creating a Data Source Based on an Existing Connection
If we have an existing data source defined in an Analysis Services database or
project and wish to create a new data source object that connects to the same
underlying data source, we can simply copy properties of the first data source
object into a new data source object. We can then specify its own
impersonation settings and then, after creating the new data source, modify
the data source to change one or more of its properties.
Tutorial: Creating a Simple ETL Package
In this tutorial, we will learn how to use SSIS (SQLServer Integration Services)
Designer to create a simple Microsoft SQL Server Integration Services package.
The package that we create takes data from a flat file, reformats the data, and
then inserts the reformatted data into a fact table. In following lessons, the
package will be expanded to demonstrate looping, package configurations,
logging and error flow.
4
What We Will Learn
The best way to become acquainted with the new tools, controls and features
available in Microsoft SQL Server Integration Services is to use them. This
tutorial walks we through SSIS Designer to create a simple ETL package that
includes looping, configurations, error flow logic and logging.
Lessons in This Tutorial
Lesson 1: Creating the Project and Basic Package
Lesson 2: Adding Looping
Lesson 3: Adding Package Configurations
Lesson 4: Adding Logging
Lesson 5: Adding Error Flow Redirection
The lessons will be explored later in the Result section of this assignment.
Designing Dimensions
A database dimension is a collection of related objects, called attributes, which
can be used to provide information about fact data in one or more cubes. For
example, typical attributes in a product dimension might be product name,
product category, product line, product size, and product price. These objects
are bound to one or more columns in one or more tables in a data source view.
By default, these attributes are visible as attribute hierarchies and can be used
to understand the fact data in a cube. Attributes can be organized into userdefined hierarchies that provide navigational paths to assist users when
browsing the data in a cube.
Cubes contain all the dimensions on which users base their analyses of fact
data. An instance of a database dimension in a cube is called a cube dimension
and relates to one or more measure groups in the cube. A database dimension
can be used multiple times in a cube. For example, a fact table can have
multiple time-related facts, and a separate cube dimension can be defined to
assist in analyzing each time-related fact. However, only one time-related
database dimension needs to exist, which also means that only one time-
5
related relational database table needs to exist to support multiple cube
dimensions based on time.
Defining Dimensions, Attributes, and Hierarchies
The simplest method for defining database and cube dimensions, attributes,
and hierarchies is to use the Cube Wizard to create dimensions at the same
time that we define the cube. The Cube Wizard will create dimensions based on
the dimension tables in the data source view that the wizard identifies or that
we specify for use in the cube. The wizard then creates the database
dimensions and adds them to the new cube, creating cube dimensions.
When we create a cube, we can also add to the new cube any dimensions that
already exist in the database. These dimensions may have been previously
defined for another cube or by the Dimension Wizard. After a database
dimension has been defined, we can modify and configure the database
dimension in Dimension Designer. We can also customize the cube dimension,
to a limited extent, in Cube Designer.
Designing Cubes
A cube is a multidimensional structure that contains dimensions and
measures. Dimensions define the structure of the cube, and measures provide
the numerical values of interest to the end user. As a logical structure, a cube
allows a client application to retrieve values as if cells in the cube defined every
possible summarized value. Cell positions in the cube are defined by the
intersection of dimension members. Dimension hierarchies provide aggregation
paths within a cube. Measure values are aggregated at non-leaf levels to
provide member values in the dimension hierarchies.
Understanding the Database Schemas
The Schema Generation Wizard generates a denormalized relational schema for
the subject area database based on the dimensions and measure groups in
Analysis Services. The wizard generates a relational table for each dimension to
store dimension data, which is called a dimension table, and a relational table
for each measure group to store fact data, which is called a fact table. The
6
wizard ignores linked dimensions, linked measure groups, and server time
dimensions when it generates these relational tables.
Validation
Before it begins to generate the underlying relational schema, the Schema
Generation Wizard validates the Analysis Services cubes and dimensions. If the
wizard detects errors, it stops and reports the errors to the Task List window in
Business Intelligence Development Studio. Examples of errors that prevent
generation include the following:
–
–
–
–
Dimensions that have more than one key attribute.
Parent attributes that have different data types than the key attributes.
Measure groups that do not have measures.
Degenerate dimensions or measures that are improperly configured.
Dimension Tables
For each dimension, the Schema Generation Wizard generates a dimension
table to be included in the subject area database. The structure of the
dimension table depends on the choices made while designing the dimension
on which it is based.
Columns
The wizard generates one column for the bindings associated to each
attribute in the dimension on which the dimension table is based, such
as the bindings for the KeyColumns, NameColumn, ValueColumn,
CustomRollupColumn,
CustomRollupPropertiesColumn,
and
UnaryOperatorColumn properties of each attribute.
Relationships
The wizard generates a relationship between the column for each parent
attribute and the primary key of the dimension table.
The wizard also generates a relationship to the primary key in each
additional dimension table defined as a referenced dimension in the cube,
if applicable.
Constraints
The wizard generates a primary key constraint, by default, for each
dimension table based on the key attribute of the dimension. If the
primary key constraint is generated, a separate name column is
7
generated by default. A logical primary key is created in the data source
view even if we decide not to create the primary key in the database.
Translations
The wizard generates a separate table to hold the translated values for
any attribute that requires a translation column. The wizard also creates
a separate column for each of the required languages.
Fact Tables
For each measure group in a cube, the Schema Generation Wizard generates a
fact table to be included in the subject area database. The structure of the fact
table depends on the choices made while designing the measure group on
which it is based, and the relationships established between the measure
group and any included dimensions.
Columns
The wizard generates one column for each measure, except for measures
that use the Count aggregation function. Such measures do not require a
corresponding column in the fact table.
The wizard also generates one column for each granularity attribute
column of each regular dimension relationship on the measure group,
and one or more columns for the bindings associated to each attribute of
a dimension that has a fact dimension relationship to the measure group
on which this table is based, if applicable.
Relationships
The wizard generates one relationship for each regular dimension
relationship from the fact table to the dimension table's granularity
attribute. If the granularity is based on the key attribute of the
dimension table, the relationship is created in the database and in the
data source view. If the granularity is based on another attribute, the
relationship is created only in the data source view.
If we chose to generate indexes in the wizard, a non-clustered index is
generated for each of these relationship columns.
Constraints
Primary keys are not generated on fact tables.
If we chose to enforce referential integrity, referential integrity
constraints are generated between dimension tables and fact tables
where applicable.
8
Translations
The wizard generates a separate table to hold the translated values for
any property in the measure group that requires a translation column.
The wizard also creates a separate column for each of the required
languages.
Microsoft Integration Services is a platform for building high performance data
integration solutions, including extraction, transformation, and load (ETL)
packages for data warehousing. Integration Services includes graphical tools
and wizards for building and debugging packages; tasks for performing
workflow functions such as FTP operations, executing SQL statements, and
sending e-mail messages; data sources and destinations for extracting and
loading data; transformations for cleaning, aggregating, merging, and copying
data; a management service, the Integration Services service for administering
package execution and storage; and application programming interfaces (APIs)
for programming the Integration Services object model.
Introducing the Schema Generation Wizard
When we design our dimensions and cubes by using the top-down method in
Business Intelligence Development Studio, we create dimension and cube
definitions in a Microsoft SQL Server Analysis Services project and then use the
Schema Generation Wizard to generate a data source view, a data source, and
the underlying relational database schema that supports these OLAP objects.
This relational database is referred to as the subject area database.
After the Schema Generation Wizard has generated the underlying objects
based on the design of our dimensions and cubes in an Analysis Services
instance or in an Analysis Services project, we can change the design of the
dimensions and cubes, and then rerun the Schema Generation Wizard to
regenerate the underlying objects based on the modified design. When the
underlying objects are regenerated, the Schema Generation Wizard
incorporates the changes into the underlying objects and, as much as is
possible, preserves the data contained in the underlying databases.
9
Defining a Fact Relationship and Fact Relationship Properties
When we define a new cube dimension or a new measure group, Analysis
Services will try to detect if a fact dimension relationship exists and then set
the dimension usage setting to Fact. We can view or edit a fact dimension
relationship on the Dimension Usage tab of Cube Designer. The fact
relationship between a dimension and a measure group has the following
constraints:

A cube dimension can have only one fact relationship to a particular
measure group.

A cube dimension can have separate fact relationships to multiple
measure groups.

The granularity attribute for the relationship must be the key attribute
(such as Transaction Number) for the dimension. This creates a one-toone relationship between the dimension and facts in the fact table.
Using the Data Mining Tools
Microsoft SQL Server Analysis Services provides tools that we can use to create
data mining solutions to address specific business problems.
In Business Intelligence Development Studio, the Data Mining Wizard makes it
easy to create mining structures and mining models that are based on OLAP
and relational data sources. We can use the wizard to define structures and
models that use specific data mining techniques to analyze our data. We can
also use Data Mining Designer to define our mining models even more, and to
explore and work with the results of the models.
SQL Server Management Studio provides tools that we can use to manage and
explore our mining models after they are created. SQL Server Integration
Services contains tools that we can use to clean data, to automate tasks such
as creating predictions and updating models, and to create text mining
solutions.
Data Mining Wizard
The Data Mining Wizard is the entry point within Business Intelligence
Development Studio for creating data mining solutions. The wizard is designed
to guide we through the process of creating a data mining structure and an
10
initial related mining model, and includes the tasks of selecting an algorithm
type and a data source, and defining a case table.
Data Mining Designer
After we use the Data Mining Wizard to create a mining structure and an initial
mining model, the Data Mining Designer opens. In the designer, we can
manage our mining structures, create new mining models, and deploy, browse,
compare, and create predictions against existing mining models.
SQL Server Management Studio
After we create and deploy mining models to a server, we can use SQL Server
Management Studio to perform management and exploratory tasks, such as
viewing and processing the models, and creating predictions against them.
Management Studio also contains a query editor that we can use to design and
execute Data Mining Extensions (DMX) queries.
Integration Services Data Mining Tasks and Transformations
SQL Server Integration Services provides tools that we can use to automate
common data mining tasks, such as processing a mining model and creating
prediction queries. For example, if we have a mining model that is built from a
dataset of potential customers, we could create an Integration Services package
that automatically updates the model every time the dataset is updated with
new customers. We could then use the package to create a prediction, by
separating the potential customers into two tables. One table could contain
likely customers and the other table customers who are not likely to purchase
any products.
Data Mining Wizard (Analysis Services - Data Mining)
The Data Mining Wizard in Microsoft SQL Server Analysis Services starts every
time that we add a new mining structure to a data mining project. The wizard
helps us define new mining structures, and chooses the data sources that we
will use for data mining. The wizard also can partition the data in the mining
structure into training and testing sets, and help us add an initial mining
model for each structure.
11
The content of a mining structure is derived from an existing data source view
or cube. We can choose which columns to include in the mining structure. All
models that are based on that structure can use those columns. We can enable
users of a data mining model to drill down from the results of the mining model
to see additional mining structure columns that oure not included in the
mining model itself.
–
–
–
–
–
We must make the following decisions when we create a data mining
structure and model by using the Data Mining Wizard:
Whether to build the data mining structure and models from a relational
database or from an existing cube in an OLAP database.
How much data to use in training, and how much to set aside for testing.
When we partition a mining structure into training and testing data sets, all
models that are based on that structure can use that testing set.
Which columns or attributes to use for prediction, and which columns or
attributes to use as input for analysis. Each structure must also contain a
key that uniquely identifies a case record.
Which algorithm to use. The algorithms provided in SQL Server Analysis
Services have different characteristics and produce different results. We can
create multiple models using different algorithms, or change parameters for
the algorithms to create different models. .
The Data Mining Wizard provides functionality to help us make these
decisions:
–
–
–
–
Wizard pages in which we define the case set. We can choose case tables
and nested tables from a relational data source, or choose an OLAP data
source and then select the case key and case level columns and then
optionally set filters on the cube.
Dialog boxes that analyze the data in columns and recommend usage for
the columns.
Auto-detection of column content and data types.
Automatic slicing of the cube, if our mining model is based on an OLAP data
source.
After we complete the Data Mining Wizard, we use Data Mining Designer to
modify the mining structure and models, to view the accuracy of the model,
view characteristics of the structure and models, or make predictions by using
the models.
12
Using the Data Mining Wizard
To start the Data Mining Wizard, add a new mining structure to an Analysis
Services project by using Solution Explorer or the Project menu in Business
Intelligence Development Studio.
The Data Mining Wizard has two branches, depending on whether our data
source is relational or in a cube:
–
–
Relational Mining Models
OLAP Mining Models
Relational Mining Models
When we build a mining model from a relational data source in Analysis
Services, we first specify in the Data Mining Wizard that we want to use an
existing relational database to define the structure of the model. We also have
the option of creating just the mining structure, or creating the mining
structure and one associated data mining model. If we choose to create a
mining model, we must specify the data mining technique to use, by selecting
the algorithm that is most appropriate for the type of data mining analysis that
we want.
Specifying the Data Source View and Table Types
The next steps in the wizard are to select the specific data source view that we
want to use to define the mining structure, and to specify a case table. The
case table will be used for training the data mining model, and optionally for
testing it as well. We can also specify a nested table.
Selecting the case table is an important decision. The case table should contain
the entities that we want to analyze: for example, customers and their
demographic information. The nested table usually contains additional
information about the entities in the case table, such as transactions
conducted by the customer, or attributes that have a many-to-one relationship
with the entity. For example, a nested tables joined to the Customers case table
might include a list of products purchased by each customers, or a list of
hobbies. For More Information: Nested Tables (Analysis Services - Data Mining)
Specifying the Column Usage
After we specify the case table and the nested tables, we determine the usage
type for each column in the tables that we will include in the mining structure.
13
If we do not specify a usage type for a column, the column will not be included
in the mining structure.
Data mining columns can be one of four types: key, input, predictable, or a
combination of input and predictable. Key columns contain a unique identifier
for each row in a table. Some mining models, such as those based on the
sequence clustering or time series algorithms, can contain multiple key
columns. However, these multiple keys are not compound keys in the relational
sense, but instead must be selected so as to provide support for time series
and sequence clustering analysis. For more information, see Microsoft Time
Series Algorithm or Microsoft Sequence Clustering Algorithm.
Input columns provide the information from which predictions are made.
Predictable columns contain the information that we try to predict in the
mining model.
For example, a series of tables may contain customer IDs, demographic
information, and the amount of money each customer spends at a specific
store. The customer ID uniquely identifies the customer and also relates the
case table to the nested tables; therefore, we would use the customer ID as the
key column. We could use a selection of columns from the demographic
information as input columns, and the column that describes the amount of
money each customer spends as a predictable column. We could then build a
mining model that relates demographics to how much money a customer
spends in a store. We could use this model as the basis for targeted marketing.
The Data Mining Wizard provides the Suggest feature, which is enabled when
we select a predictable column. Datasets frequently contain more columns
than we need to build a mining model. The Suggest feature calculates a
numeric score, from 0 to 1, that describes the relationship between each
column in the dataset and the predictable column. Based on this score, the
feature suggests columns to use as input for the mining model. If we use the
Suggest feature, we can use the suggested columns, modify the selections to fit
our needs, or ignore the suggestions.
Specifying the Content and Data Types
After we select one or more predictable columns and input columns, we can
specify the content and data types for each column.
14
Split Data into Training and Testing Sets
The final step before we complete the wizard is to partition our data into
training and testing sets. The ability to hold out a portion of the data for testing
is new in SQL Server 2008 and provides an easy-to-use mechanism for
ensuring that a consistent set of test data is available for use with all mining
models associated with the new mining structure.
We can specify that a certain percentage of the data be used for testing, and all
remaining will be used for training. We can also specify the number of cases to
use for testing. The definition of the partition is stored with the mining
structure, so that whenever we create a new model based on the structure, the
testing data set will be available for assessing the accuracy of the model.
For More Information: Validating Data Mining Models (Analysis Services - Data
Mining), Partitioning Data into Training and Testing Sets (Analysis Services Data Mining)
Completing the Wizard
The last step in the wizard is to name the mining structure and the associated
mining model. If we select Allow drill through, the drill through functionality is
enabled in the model. This lets users who have the appropriate permissions
explore the source data that is used to build the model.
OLAP Mining Models
When we build a multidimensional mining model from an OLAP data source in
Analysis Services, we first specify in the Data Mining Wizard that we want to
use an existing cube to define the structure of the model. We have the option of
creating just the mining structure, or creating the mining structure plus one
associated data mining model. If we choose to create a mining model, we must
specify the data mining technique to use, by selecting the algorithm that is
most appropriate for our business problem.
Specifying the Data Source and Case Key
Next, we select the cube dimension to use as the data source to define the
mining structure. Then we select an attribute to use as the key, or case key, of
the mining model.
15
Specifying Case Level Columns and Column Usage
After we select a case key, the attributes and measures that are associated with
that key are displayed in a tree view on the next page of the wizard. From this
list, we select the attributes and measures to be used as the columns of the
structure. These columns are known as case level columns. As with a relational
model, we must also specify how each column should be used in the structure,
which we can do on the next page of the wizard. Columns can be key, input,
predictable, input and predictable, or unselected.
Adding Nested Tables
The OLAP branch of the Data Mining Wizard includes the option to add nested
tables to the mining model structure. On the Specify Mining Model Column
Usage page of the wizard, click Add Nested Tables to open a separate dialog box
that guides we through the steps to add nested tables. Only the measure
groups that apply to the dimension are displayed. Select a measure group that
contains the foreign key of the case dimension. Next, specify the usage for each
column in the measure group, either input or predictable. The wizard then
adds the nested table to the case table. The default name for the nested table is
the nested dimension name, but we can rename the nested table and its
columns. For More Information: Nested Tables (Analysis Services - Data
Mining)
Specifying the Content and Data Types
After we select one or more predictable columns and input columns, we can
specify the content and data types for each column.
Slicing the Source Cube
In the OLAP branch of the wizard, we can limit the scope of our mining model
by slicing the source cube before we train the mining model. Slicing the cube is
similar to adding a WHERE clause to an SQL statement. For example, if a cube
contains information about the purchase of products, we might limit an age
attribute to more than 30, a gender column to only female, and a purchase
date to no earlier than March 2000. In such a way we can limit the model to
cover the scope of a female who is older than 30 years and who bought a
product after March 2000.
16
Split Data into Training and Testing Sets
The final step before we complete the wizard is to partition the data that is
available from the cube into training and testing sets. The definition of the
partition is stored with the mining structure, so that whenever we create a new
model based on the structure, the testing data set will be available for
assessing the accuracy of the model.
Completing the Wizard
The last step in the wizard is to name the mining structure and the associated
mining model. If we select Allow drill through, the drill through functionality is
enabled in the model. This lets users who have the appropriate permissions
explore the source data that is used to build the model. We can also specify
whether we want to add a new dimension to the source cube that is based on
the mining model, or create a new cube from the mining model.
For more concrete practice, we’ll use these theories on Basic Data Mining
Tutorial in Result section of this assignment.
17
Methodology
We took case study from the tutorial that we’ve got from Microsoft SQL Server
2008 Books Online: Basic Data Mining Tutorial. In this Basic Data Mining
Tutorial, we will complete a scenario for a targeted mailing campaign in which
we create three models for analyzing customer purchasing behavior and
targeting potential buyers. The tutorial demonstrates how to use the data
mining algorithms, mining model vieours, and data mining tools that are
included in Microsoft SQL Server Analysis Services. The fictitious company,
Adventure Works Cycles, is used for all examples. We also add some steps for
creating a cube based on dimensions that was determine before. Here it is.
Tutorial Scenario
In this tutorial, we are an employee of Adventure Works Cycles who has been
tasked with learning more about the company's customers based on historical
purchases, and then using that historical data to make predictions that can be
used in marketing. The company has never done data mining before, so we
must create a new database specifically for data mining and set up several data
mining models.
This tutorial is divided into the following lessons:
Lesson 1: Preparing the Analysis Services Database
Lesson 2: Building a Targeted Mailing Structure
Lesson 3: Adding and Processing Models
Lesson 4: Exploring the Targeted Mailing Models
Practical Work Scenarios
The practical work is done on Friday 16th, 2009, from 10.00 to 11.00. It was
over earlier than it was planned. We weren’t use any of software and hardware
this time. We only listening to the presentation given by assistants. But after
that, we get assignment to explore the software and documented it in this
report.
18
Result
Lesson 1:
Preparing the Analysis Services Database (Basic Data Mining Tutorial)
In this lesson, we will learn how to create a new Analysis Services database,
add a data source and data source view, and prepare the new database to be
used with data mining.
Creating new project
19
Choose Analysis Service Project
Right click Data Source, choose New Data Source...
20
Click Next
Select the methods of connection
21
Create new connection, here we use localhost as server. Click OK.
Select Use the Service Account. Click Next.
22
Completing the Wizard page let us review the setting.
Notes that a new data source appear in Data Sources folder.
23
To create Data Source View, right click the folder, then choose New Data
Source View
Click Next.
24
Select the existing Data Source that we’ve created before.
Select Tables and Views we want to include in Data Source View. Click Next.
25
Name the new Data Source Views ‘Targeted Mailing’, then click Finish.
After that we got the Data Source View ready.
26
Lesson 2:
Building a Targeted Mailing Structure (Basic Data Mining Tutorial)
In this lesson, we will learn how to create a mining model structure that can be
used as part of a targeted mailing scenario.
First, right click the Mining Structures folder, choose New Mining Structure...
27
Click Next.
Select the Definition Method. Here we choose the definition from existing
relational database. (It was different from the assignment, it should be from
cube. So we’ll make the cube after this lessons)
28
Specify the Data Mining Structure. Here we choose Microsoft Desicion Tree as
algorithm.
Then, we select Data Source View that we’ve created: Targeted Mailing.
29
Then we specify table types, we choose the v Target Mail table/view as case.
Then we specify the Training Data. We got the samples for selecting the tables
from the tutorial.
30
We can also get suggestion from the software for choosing the right tables.
Specify the data that will be the input, predicts, and additional table that
wuoldn’t connect each other but we want to include it.
31
Specify the column content and data type.
Then, we create testing set.
32
In Completing the Wizard page, we can review the settings.
The result is like picture above.
33
Lesson 3:
Adding and Processing Models
In this lesson we will learn how to add models to a structure. The models we
create are built with the following algorithms:
a. Microsoft Decision Trees
b. Microsoft Clustering
c. Microsoft Naive Bayes
Click the Mining Models tab.
34
Right click the column, the select New Mining Model...
Input the name of the model. Here we enter TM_Clustering for this model is will
be using Microsoft Clustering Algorithm.
35
After we Click OK, we can see the result like picture above.
We can also add Microsoft Naive Bayes Algorithm-based model.
36
We’re done for the model.
For executing, click Mining Model -> Process...
37
Click Run.
The above picture is indicating the process progress.
38
Lesson 4:
Exploring the Targeted Mailing Models (Basic Data Mining Tutorial)
In this lesson we will learn how to explore and interpret the findings of each
model using the Views.
For exploring the views, we select the Mining Model Viewer.
39
We can also choose the views for Clustering model.
We can also choose the views for Naive Bayes model.
40
To see the Accuracy Chart, click the Mining Accuracy Chart tab. Specify the
predict value to 1 and choose Use mining structure test cases.
41
We can see the Lift Chart showing the most appropriate algorithm we should
use. In this camparison, we see that Desicion Tree is the most appropriate.
Creating a cube
To have an advanced mutidimensional analysis, we’ll create a cube.
Right click the Cube folder, choose New Cube...
42
Click Next.
Select the creation method. Here we choose to use existing tables
43
From the Data Source View ‘Targeted Mailing’, we indentify its two tables, then
we select them.
Then we select the Measures. Here we select all of the available options.
44
Then we select the existing dimensions.
Select the new dimensions.
45
In the Completing the Wizard page, we name the cube and reviews the options
we make.
46
The two above picture show the Cube result.
47
Discussion
In this practical work we find much problem in connecting the SQL Server with
the database as the data sources we use. Maybe it caused by wrong installation
steps in last practical work.
We actually done this practical work by following the tutorial provided by
Microsoft SQL Server 2008. So, the base theories wouldn’t be too clear for us.
May the next practical work be hellpful for us to understand more.
Conclusion
At last we know the big picture for using Analysis Service provided by Microsoft
SQL Server 2008. These points are:
 Defining Data Sources (Analysis Services)
 Defining a Data Source Using the Data Source Wizard (Analysis
Services)
 Designing Dimensions
 Designing Cube
 Understanding the Database Schemas
 Introducing the Schema Generation Wizard
 Defining a Fact Relationship and Fact Relationship Properties
 Using the Data Mining Tools
 Data Mining Wizard (Analysis Services – Data Mining)
 Basic Data Mining Tutorial (Result Section)
48
References
[1]
Microsoft. (2009). Download details : Microsoft SQL Server 2008 Books Online (July
2009). Accesed 2009, from www.microsoft.com:
http://www.microsoft.com/downloads/thankwe.aspx?familyId=765433f7-0983-4d7a-b6280a98145bcb97&displayLang=en
49