Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
한국정보통신기술협회/데이터기술위원회 2001년도 워크샵
SQL 데이타마이닝 표준
(SQL Standard for Data Mining)
Oct. 2001
Hwan-Seung Yong
Ewha Womans Univ.
http://dblab.ewha.ac.kr/hsyong
Data Mining Standard by ISO/IEC
• Information Technology – Database Languages –
SQL Multimedia and Application Packages – Part
6: Data Mining
• FCD 13249-6, 2001-05-21
– Vote deadline 2001-10-05
• Other Part
–
–
–
–
Oct. 2001
Framework
Full-text
Spatial
Still Image
H.S.Yong
2
Data Mining
• Association rule
– Given a set of purchase transactions (baskets) which
contain a set of items.
– Find rules of the form: If a purchase transaction
contains item X and item Y then the purchase
transaction also contains item Z in N% of all purchase
transactions.
– Example application: store layout.
• Clustering/Segmentation
– Given a set of rows with a set of fields. Find sets of rows
with common characteristics - the so-called clusters.
– Example application: customer mailings.
Oct. 2001
H.S.Yong
3
Data Mining
• Classification
– Given a set of rows with a set of fields and a special field
the so-called class label. Compute a classification
model such that the class label can be predicted by
using the model and a set of field values without the
class label.
– Example application: insurance risk prediction.
• Regression
– Regression is very similar to classification except for
the type of the predicted value. Rather than predicting a
class label, regression is predicting a continuous value.
– Example application: customer ranking.
Oct. 2001
H.S.Yong
4
Computational Patterns
• Training Phase
– Common to all data mining techniques, this is the phase
in which the data mining model is computed.
• Application Phase
– Phase during which a row is evaluated against a data
mining model and one or more values are computed
• Test Phase
– Phase that reads a set of rows containing values for the
target field, evaluates each row as in the application
phase, and compares the predicted value to the actual
value in the target field.
– only used for data mining classification and regression.
Oct. 2001
H.S.Yong
5
Standard Language
• By defining various user defined types (UDT)
– Just specification not implementation
• Data Mining Model Types
– Storage and retrieval of data mining model values
• Data Mining Setting Types
– Define a target field and parameters for algorithms
• Data Mining Application Result Types
– The result of applying a mining model to a row
• Data Mining Data Types
• Status Code
Oct. 2001
H.S.Yong
6
Data Mining Model Types
• Type: DM_RuleModel Type
– Result of association rules
• Method
– DM_impRuleModel (CHARACTER LARGE
OBJECT(DM_MaxContentLength))
• import rule model expressed as PMML spec.
• Return DM_RuleModel
– DM_expRuleModel(): export rule model using PMML
– DM_getNORules(): return number of rules
– DM_getRuleTask(): return data mining task value
• Data mining settings etc.
Oct. 2001
H.S.Yong
7
Data Mining Settings Types
• Setting Mining parameters
• DM_RuleSettings
• Method
– setMinSupport(DOUBLE PRECISION)
– getMinSupport()
– DM_ruleUseDataSpec(DM_LogicalDataSpec)
• logicalDataSpec is an abstraction of source table for input
data
– DM_ruleGetDataSpec()
– DM_ruleSetGroup(CHARACTER VARYING)
• Identify grouping field for mining association
• Ex) Purchase transaction etc
– DM_ruleGetGroup().
Oct. 2001
H.S.Yong
8
Data Mining Application Result Types
• The result of applying a mining model to a row
• DM_ClusResult Type
• Method
– DM_getClusterID()
• Return cluster identification number
– DM_getQuality()
• Degree of fitness to predicted cluster
Oct. 2001
H.S.Yong
9
Data Mining Data Types
• Represent input data needed for data mining
• DM_LogicalDataSpec Type and Routines
– Abstraction of set of field for mining
• Method
–
–
–
–
–
DM_addDataSpecFld(CHARACTER VARYING),
DM_remDataSpecFld(CHARACTER VARYING),
DM_getNOFields(),
DM_getFieldName(INTEGER),
DM_setFieldType(CHARACTER VARYING, SMALLINT),
• Two kinds of type: Categorical, Numeric
– DM_getFieldType(CHARACTER VARYING),
– DM_compatibleSpec(DM_LogicalDataSpec).
Oct. 2001
H.S.Yong
10
Data Mining Data Types
• DM_MiningData Type and Routines
– Abstraction of input data for mining
• Method
– DM_defMiningData(CHARACTER VARYING)
• Input is source table and define as mining data
– DM_defFldAlias(CHARACTER VARYING, CHARACTER
VARYING),
• Define field name alias
– DM_genLogDataSpec()
• Generate value of type DM_LogicalDataSpec
Oct. 2001
H.S.Yong
11
PMML
• Predictive Model Markup Language
– Easily define predictive model and share the models
between companies
– XML based
– Driven by Data Mining Group
• www.dmg.org
• Consosium of Major Data Mining Vendors
– Currently version 2.0
– Aug 26, 2001 2nd workshop on PMML was held and
presented PMML 2.0
Oct. 2001
H.S.Yong
12
DTD of Association Rules Model
<!ELEMENT AssociationModel (Extension*, AssocInputStats,
AssocItem+, AssocItemset+, AssocRule+)>
<!ATTLIST AssociationModel
modelName CDATA #IMPLIED
>
<!ELEMENT AssocInputStats EMPTY>
<!ATTLIST AssocInputStats
numberOfTransactions %INT-NUMBER; #REQUIRED
maxNumberOfItemsPerTA %INT-NUMBER; #IMPLIED
avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED
minimumSupport %PROB-NUMBER; #REQUIRED
minimumConfidence %PROB-NUMBER; #REQUIRED
lengthLimit %INT-NUMBER; #IMPLIED
numberOfItems %INT-NUMBER; #REQUIRED
numberOfItemsets %INT-NUMBER; #REQUIRED
numberOfRules %INT-NUMBER; #REQUIRED
>
Oct. 2001
H.S.Yong
13
PMML Example: Association Rule 1/2
• t1: Cracker,
Coke,
Water
• t2: Cracker,
Water
• t3: Cracker,
Water
• t4: Cracker,
Coke,
Water
Oct. 2001
<?xml version="1.0" ?>
<PMML version="1.1">
<Header copyright="www.dmg.org"
description="example model for association rules"/>
<DataDictionary numberOfFields="1"/>
<DataField name="item" optype="categorical"/>
</DataDictionary>
<AssociationModel>
<AssocInputStats numberOfTransactions="4"
numberOfItems="3" minimumSupport="0.6"
minimumConfidence="0.5" numberOfItemsets="3"
numberOfRules="2"/>
<!-- We have three items in our input data -->
<AssocItem id="1"value="Cracker"/>
<AssocItem id="2"value="Coke"/>
<AssocItem id="3"value="Water"/>
<!-- and two frequent itemsets with a single item -->
H.S.Yong
14
PMML Example: Association Rule 2/2
• t1: Cracker,
Coke, Water
• t2: Cracker,
Water
• t3: Cracker,
Water
• t4: Cracker,
Coke, Water
Oct. 2001
<AssocItemset id="1"support="1.0" numberOfItems="1"/>
<AssocItemRef itemRef="1"/>
</AssocItemset>
<AssocItemset id="2" support="1.0" numberOfItems="1"/>
<AssocItemRef itemRef="3"/>
</AssocItemset>
<!-- and one frequent itemset with two items. -->
<AssocItemset id="3" support="1.0" numberOfItems="2"/>
<AssocItemRef itemRef="1"/>
<AssocItemRef itemRef="3"/>
</AssocItemset>
<!-- Two rules satisfy the requirements -->
<AssocRule support="1.0" confidence="1.0" antecedent="1"
consequent="2"/>
<AssocRule support="1.0" confidence="1.0" antecedent="2"
consequent="1"/>
</AssociationModel>
</PMML>
H.S.Yong
15
Final Remarks
• Data Mining is hot and promising area like DBMS
• Standard activity
– SQL Data Mining Standard is ready
– PMML standard for exchange of mining result is ready
– But no software yet
• Further Research and Standard Area
– SQL for Multimedia Data Mining
Oct. 2001
H.S.Yong
16
Related documents