Download Appendix A. An Introduction to Microsoft's OLE DB for Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining:
Concepts and Techniques
— Slides for Textbook —
—
Appendix A
—
©Jiawei Han and Micheline Kamber
Slides contributed by Jian Pei ([email protected])
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
May 22, 2017
Data Mining: Concepts and Techniques
1
Appendix A: An Introduction to
Microsoft’s OLE OLDB for Data Mining

Introduction

Overview and design philosophy

Basic components

Data set components

Data mining models

Operations on data model

Concluding remarks
May 22, 2017
Data Mining: Concepts and Techniques
2
Why OLE DB for Data Mining?



Industry standard is critical for data mining development,
usage, interoperability, and exchange
OLEDB for DM is a natural evolution from OLEDB and
OLDB for OLAP
Building mining applications over relational databases is
nontrivial



Need different customized data mining algorithms and
methods
Significant work on the part of application builders
Goal: ease the burden of developing mining applications
in large relational databases
May 22, 2017
Data Mining: Concepts and Techniques
3
Motivation of OLE DB for DM

Facilitate deployment of data mining models



Generating data mining models
Store, maintain and refresh models as data is
updated

Programmatically use the model on other data set

Browse models
Enable enterprise application developers to participate
in building data mining solutions
May 22, 2017
Data Mining: Concepts and Techniques
4
Features of OLE DB for DM

Independent of provider or software

Not specialized to any specific mining model


Structured to cater to all well-known mining
models
Part of upcoming release of Microsoft SQL
Server 2000
May 22, 2017
Data Mining: Concepts and Techniques
5
Overview

Core relational engine
exposes OLE DB in a
language-based API

Data mining
applications
OLE DB OLAP/DM
Analysis server exposes OLE
DB OLAP and OLE DB DM
Analysis Server

Maintain SQL metaphor
OLE DB

Reuse existing notions
RDB engine
May 22, 2017
Data Mining: Concepts and Techniques
6
Key Operations to Support Data
Mining Models




Define a mining model
 Attributes to be predicted
 Attributes to be used for prediction
 Algorithm used to build the model
Populate a mining model from training data
Predict attributes for new data
Browse a mining model fro reporting and
visualization
May 22, 2017
Data Mining: Concepts and Techniques
7
DMM As Analogous to A Table in SQL






Create a data mining module object
 CREATE MINING MODEL [model_name]
Insert training data into the model and train it
 INSERT INTO [model_name]
Use the data mining model
 SELECT relation_name.[id], [model_name].[predict_attr]
 consult DMM content in order to make predictions and
browse statistics obtained by the model
Using DELETE to empty/reset
Predictions on datasets: prediction join between a model
and a data set (tables)
Deploy DMM by just writing SQL queries!
May 22, 2017
Data Mining: Concepts and Techniques
8
Two Basic Components

Cases/caseset: input data


A table or nested tables (for hierarchical data)
Data mining model (DMM): a special type of table



A caseset is associated with a DMM and meta-info while
creating a DMM
Save mining algorithm and resulting abstraction instead
of data itself
Fundamental operations: CREATE, INSERT INTO,
PREDICTION JOIN, SELECT, DELETE FROM, and DROP
May 22, 2017
Data Mining: Concepts and Techniques
9
Flatterned Representation of Caseset
Customers
Customer ID
Gender
Hair Color
Age
Age Prob
Product
Purchases
Customer ID
Quantity
Product Type
CID
Car
Owernership
Problem: Lots of replication!
Product Name
Gend
Hair
Age
Age prob
Prod
Quan
Type
Car
Car prob
1
Male
Black
35
100%
TV
1
Elec
Car
100%
1
Male
Black
35
100%
VCR
1
Elec
Car
100%
1
Male
Black
35
100%
Ham
6
Food
Car
100%
Car
1
Male
Black
35
100%
TV
1
Elec
Van
50%
Car Prob
1
Male
Black
35
100%
VCR
1
Elec
Van
50%
1
Male
Black
35
100%
Ham
6
Food
Van
50%
Customer ID
May 22, 2017
Data Mining: Concepts and Techniques
10
Logical Nested Table Representation
of Caseset

Use Data Shaping Service to generate a
hierarchical rowset

Part of Microsoft Data Access Components
(MDAC) products
CID
1
May 22, 2017
Gend
Male
Hair
Black
Age
35
Age prob
100%
Product Purchases
Car Ownership
Prod
Car
Quan
Type
TV
1
Elec
VCR
1
Elec
Ham
6
Food
Data Mining: Concepts and Techniques
Car prob
Car
100%
Van
50%
11
More About Nested Table



Not necessary for the storage subsystem to
support nested records
Cases are only instantiated as nested rowsets
prior to training/predicting data mining models
Same physical data may be used to generate
different casesets
May 22, 2017
Data Mining: Concepts and Techniques
12
Defining A Data Mining Model

The name of the model

The algorithm and parameters

The columns of caseset and the relationships
among columns

“Source columns” and “prediction columns”
May 22, 2017
Data Mining: Concepts and Techniques
13
Example
CREATE MINING MODEL [Age Prediction]
%Name of Model
(
[Customer ID]
LONG KEY,
%source column
[Gender]
TEXT DISCRETE,
%source column
[Age]
Double DISCRETIZED() PREDICT, %prediction column
[Product Purchases]
TABLE
%source column
(
[Product Name] TEXT KEY,
%source column
[Quantity]
DOUBLE NORMAL CONTINUOUS, %source column
[Product Type] TEXT DISCRETE RELATED TO [Product Name]
%source column
))
USING [Decision_Trees_101]
%Mining algorithm used
May 22, 2017
Data Mining: Concepts and Techniques
14
Column Specifiers




KEY
ATTRIBUTE
RELATION (RELATED TO clause)
QUALIFIER (OF clause)
 PROBABILITY: [0, 1]
 VARIANCE
 SUPPORT
 PROBABILITY-VARIANCE
 ORDER
 TABLE
May 22, 2017
Data Mining: Concepts and Techniques
15
Attribute Types

DISCRETE

ORDERED

CYCLICAL

CONTINOUS

DISCRETIZED

SEQUENCE_TIME
May 22, 2017
Data Mining: Concepts and Techniques
16
Populating A DMM

Use INSERT INTO statement

Consuming a case using the data mining model

Use SHAPE statement to create the nested
table from the input data
May 22, 2017
Data Mining: Concepts and Techniques
17
Example: Populating a DMM
INSERT INTO [Age Prediction]
(
[Customer ID], [Gender], [Age],
[Product Purchases](SKIP, [Product Name], [Quantity], [Product Type])
)
SHAPE
{SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]}
APPEND
{SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales
ORDER BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
May 22, 2017
Data Mining: Concepts and Techniques
18
Using Data Model to Predict



Prediction join
 Prediction on dataset D using DMM M
 Different to equi-join
DMM: a “truth table”
SELECT statement associated with PREDICTION
JOIN specifies values extracted from DMM
May 22, 2017
Data Mining: Concepts and Techniques
19
Example: Using a DMM in Prediction
SELECT t.[Customer ID], [Age Prediction].[Age]
FROM [Age Prediction]
PRECTION JOIN
(SHAPE
{SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]}
APPEND
(
{SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
)
AS t
ON [Age Prediction].[Gender]=t.[Gender] AND
[Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND
[Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]
May 22, 2017
Data Mining: Concepts and Techniques
20
Browsing DMM

What is in a DMM?


Rules, formulas, trees, …, etc
Browsing DMM

May 22, 2017
Visualization
Data Mining: Concepts and Techniques
21
Concluding Remarks



OLE DB for DM integrates data mining and
database systems
 A good standard for mining application builders
How can we be involved?
 Provide association/sequential pattern mining
modules for OLE DB for DM?
 Design more concrete language primitives?
References
 http://www.microsoft.com/data.oledb/d
m.html
May 22, 2017
Data Mining: Concepts and Techniques
22
www.cs.uiuc.edu/~hanj
Thank you !!!
May 22, 2017
Data Mining: Concepts and Techniques
23