Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Towards Using Grid Services for Mining Fuzzy Association Rules Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae Constantinescu Faculty of Mathematics and Computer Science, University of Craiova, ROMANIA {mihaiug, mirelc,nikyc}@central.ucv.ro,i [email protected] Introduction • In this paper we show how the Knowledge Grid infrastructure can be used to implement a distributed algorithm for mining fuzzy association rules from distributed databases over a Grid network. MINING + FUZZY Grid network Outline • Knowledge Grid services • Distributed fuzzy association rules mining • Distributed problem definition • The distributed algorithm • Rules mining implementation over the Grid • Conclusion Knowledge Grid Services-1 • The Knowledge Grid ([4], [5], [6]) defines an integrating architecture for distributed data mining and knowledge discovery. • It uses basic grid services to build specific knowledge services. • the Core K-grid layer - offers services directly implemented on the top of generic grid services; • the High level K-grid layer - is used to describe, develop and execute distributed knowledge discovery computations; Knowledge Grid Services-2 Resource allocation and execution management service (RAEMS). Another Knowledge The it is Knowledge usedimportant metadata directory Base repository information service Repository is (KDS). the stored Knowledge (KBR) in a Knowledge Execution Metadata Plan Repository Repository These services are used to find bestofmapping between planfor This is (KEPR). (KMR). used service toItmaintain store extends thediscovered execution the basic plans Globus knowledge. MDS data mining service processes. and an it isexecution responsible and available resources,with satisfying the application maintaining a description of allthe thegoal dataofand tools used in the Knowledge Grid. requirements. Knowledge Grid Services-2 Results presentation service (RPS). (EPMS). Tools and algorithms access service (TASS). Data Access Execution plan Service management (DAS). This service specifies how to present and (data visualize the services), models is responsible forgenerate, thetool search, selection, and downloading of a semi-automatic that takes selection data programs search selected extracted. data mining and algorithms. extraction,transformation by user, andtools generate a set and ofdelivery different,possible (data extraction plans that service) meetofuser, datadata to be mined. and algorithms requirements and constrains. Distributed fuzzy association rules mining-1 DB = {t1, . . . , tn} I = {i1, . . . , im} Ex: I = {Age, Income, Weight} Distributed fuzzy association rules mining-2 For example, we can take into onsideration for the attribute Weight the following three fuzzy sets: ”thin”,”middle” and ”fat”. Fweigth = { thin , middle , fat } Distributed fuzzy association rules mining-3 〈X,Fx 〉=〈{Age, Income}, {young, high}〉 Distributed fuzzy association rules mining-4 “ If Age is middle and Income is high then Weight is fat ” X ={Age, Income}, Y = {Weight}, FX = { middle, high }, FY = { fat } 〈X,Fx 〉= >〈Y,FY 〉 〈{Age, Income}, {middle, high} 〉 ⇒〈 {Weight}, {fat} 〉 Distributed fuzzy association rules mining-4 T1=〈{Age, Income}, {middle, high} 〉= 〈{Age, Income}, { 0.5 , 1 } 〉 T2=〈{Age, Income}, {middle, high} 〉= 〈{Age, Income}, { 1 , 1 } 〉 The fuzzy support value of itemset〈X,Fx 〉=〈{Age, Income}, {middle, high}〉 0.5 * 1 + 1 * 1 = 1.5 / 2 = 0.75 Distributed fuzzy association rules mining-5 An association rule is considered as interesting if it has enough support and high confidence value. This association rule can be encountered under the name strong rule. Distributed fuzzy association rules mining-6 • The problem of sequential mining of fuzzy association rules can be decomposed in two subproblems: 1. find all large fuzzy itemsets. 2. generate the fuzzy association rules from the large fuzzy itemsets founded. Example age 15 30 weight 40 70 age weight thin young old 1 0 0 0.5 0 0.5 0.5 1 Support count large fuzzy itemsets 〈{Age, Weight}, {young, thin} 〉=> 1*0.5 + 0*0.5 > Minsup 〈{Age, Weight}, {young, fat} 〉 => 1*0 +0*1 〈{Age, Weight}, {old, thin} 〉 = > 0*0.5 +0.5*0.5 〈{Age, Weight}, {old, fat} 〉 = > 0*0 +0.5*1 fat Distributed problem definition-1 • Let DB = { DB1,DB2, . . . ,DBn } be a distributed database over n sites S1, S2, . . . , Sn. DB1 ….. DB2 ……. DBn Distributed problem definition-2 Distributed problem definition-3 Distributed problem definition-4 Distributed problem definition-5 Distributed Mining Fuzzy Association Rules Given the set of items I, the distributed database DB = {DB1,DB2, . . . ,DBn}, the fuzzy sets associated with attributes from I, the minimum support threshold (minsup) and the minimum confidence threshold (minconf), extract all global fuzzy association rules. 1. find all global large fuzzy itemsets. 2. generate the global fuzzy association rules from the global large fuzzy itemsets founded. Fuzzy Count Distribution Algorithm global globally large large candidates fuzzy 1-itemsets 1-itemsetsCA(1). L(1). First generated L1 globally large fuzzy 1-itemsets L(1). CA(k) = Fuzzy_Apriori_Gen(L(k−1)). local large fuzzy 1-itemsets local large local large fuzzy 1-itemsets fuzzy 1-itemsets …………. Rules mining implementation over the Grid-1 Distributed Rules Mining Scenario Rules mining implementation over the Grid-2 In order to present the implementation of this process in a Grid network we shall consider that: • the database DB is stored on K-grid node NodeA. • the tools needed for mining association rules (the partitioner P, mining frequent itemsets tool and association rules extractor) are available as multiplatform executables on K-grid node NodeS. • the results will be stored into the Knowledge Base Repository (KBR) on NodeU. Rules mining implementation over the Grid-3 • Let’s suppose that a Grid User (GU) needs to extract all association rules from database DB using tools available on K-grid node NodeS. • Step 1.The GU starts the search of computational resources for executing the data mining process from his K-grid node NodeU. In order to locate the computation resources needed to execute the mining process the KDS (Knowledge Discovery Service) will be used. Rules mining implementation over the Grid-4 • Step 2. The GU builds an execution plan for the data mining task, specifying strategies for tools and data movements.The execution plan is constructed by using the EPMS (Execution Plan Management Service). This plan will be stored into local KEPR (Knowledge Execution Plan Repository). • Step 3. The GU sends the execution plan to RAEMS (Resource Allocation and Execution Management ervice) which starts the application. • Step 4. The GU visualizes and evaluates the result of computation stored in KBR by means of the RPS (Result Presentation Service) tools. Conclusion • In this article, it is proposed an implementation of a distributed algorithm for mining fuzzy association rules from distributed databases into a Knowledge Grid environment. • The proposed algorithm uses some properties of global large fuzzy itemsets and local large fuzzy itemsets, reduction of computations made heavily relying on them. Knowledge Grid Services-2