* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Association Rule Mining in Peer-to
Post-quantum cryptography wikipedia , lookup
Algorithm characterizations wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Network science wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Theoretical computer science wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Rete algorithm wikipedia , lookup
Association Rule Mining in Peer-to-Peer Systems Ran Wolff Assaf Shcuster Department of Computer Science Technion I.I.T. Haifa 32000,Isreal Difficulties of Distributed DB Impracticality of global communications and global synchronization Dynamic topology changes of the network On-the-fly data updates Resource sharing with other applications Frequent failure and recovery of resources. The Algorithm Requirements Entirely asynchronous Imposes very little communication overhead Transparently tolerates network topology changes and node failures Quickly adjusts to changes in the data as they occur Problems in LSD-ARM There can be no global synchronization Nodes must act independently No point in time in which the algorithm is known to have finished No way of knowing that the information they possess is final and accurate. Solution For each node to maintain an assumption of the correct result Update the result whenever new data arrives Nodes compute the result through local negotiation with their immediate neighbor Dynamic nature of LSD system If the mean time between failures of a single node is 20,000 hours A system consisting of 100,000 nodes could easily fail five times per hour Whenever a node departs, the global DB and result of computation will be changed Similar problem occurs when new nodes join The majority voting protocol Requires no synchronization between the computing nodes Each node communicates only with its immediate neighbors Locality implies that the algorithm is scalable to very large network Notation definition database at time t partition of node u at time t the group of machines reachable from u at time t solution of LSD-ARM problem, for node u at time t, which is a set of rules 8 LSD-Majority LSD-Majority :an entirely different majority voting protocol The purpose is to ensure that each node converges toward the correct majority Ad-hoc solution of node u is : 1 :when the majority in 0 :when the majority in is of set bits is of unset bits The nodes communicate by sending messages containing two integers Count :stands for the number of bits this message reports Sum :which is the number of those bits which are equal to one Cu is for now one △u measures the number of access set bits u has been informed of △uv measures the number of access set bits u and v have last reported to one another △u recalculation: each time Su changes, a message is received, or a node connects to v or disconnects from v △uv recalculation: each time a message is sent to or received from v As long as △u≥△uv≥0and △v≥△vu≥0 ,there is no need to exchange data Algorithm 1: LSD-Majority Generalize LSD-Majority for frequency counts Cu: size of the local database Su: local support of an itemset λ: MinFreq Thus the resulting protocol will decide whether an itemset is frequent or not in Cu: the number of transactions that include X in the local database Su: the number of these transactions include both X and Y λ: MinConf Thus the result will decide whether a rule X→Y is confident or not. Deciding whether a rule is correct or false requires that each node run two instances of the protocol. This way LSD-Majority efficiently decides whether a candidate rule is correct or false. Majority-Rule Each node must take into account not only the local data, but also data brought to it by LSD-Majority. An algorithm which never really finishes discovering all itemsets must generate rules on the fly. Majority-Rule Conclusion A distributed majority vote protocol- LSDMajority as part of the algorithm An algorithm – Majority-Rule that mines association rules on distributed systems of unlimited size. Key quality is its locality. Also fast convergence of the result and low communication demands