Download Association Rule Mining in Peer-to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Post-quantum cryptography wikipedia , lookup

Algorithm characterizations wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Network science wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Algorithm wikipedia , lookup

Theoretical computer science wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Minimax wikipedia , lookup

Rete algorithm wikipedia , lookup

Signal-flow graph wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Association Rule Mining in
Peer-to-Peer Systems
Ran Wolff Assaf Shcuster
Department of Computer Science
Technion I.I.T. Haifa 32000,Isreal
Difficulties of Distributed DB





Impracticality of global communications
and global synchronization
Dynamic topology changes of the network
On-the-fly data updates
Resource sharing with other applications
Frequent failure and recovery of resources.
The Algorithm Requirements




Entirely asynchronous
Imposes very little communication
overhead
Transparently tolerates network topology
changes and node failures
Quickly adjusts to changes in the data as
they occur
Problems in LSD-ARM

There can be no global synchronization



Nodes must act independently
No point in time in which the algorithm is
known to have finished
No way of knowing that the information they
possess is final and accurate.
Solution



For each node to maintain an assumption
of the correct result
Update the result whenever new data
arrives
Nodes compute the result through local
negotiation with their immediate neighbor
Dynamic nature of LSD system

If the mean time between failures of a
single node is 20,000 hours


A system consisting of 100,000 nodes could
easily fail five times per hour
Whenever a node departs, the global DB
and result of computation will be changed

Similar problem occurs when new nodes join
The majority voting protocol



Requires no synchronization between the
computing nodes
Each node communicates only with its
immediate neighbors
Locality implies that the algorithm is
scalable to very large network
Notation definition




database at time t
partition of node u at time t
the group of machines reachable from u at
time t
solution of LSD-ARM problem, for node u
at time t, which is a set of rules
8
LSD-Majority



LSD-Majority :an entirely different majority
voting protocol
The purpose is to ensure that each node
converges toward the correct majority
Ad-hoc solution of node u is :


1 :when the majority in
0 :when the majority in
is of set bits
is of unset bits

The nodes communicate by sending
messages containing two integers


Count :stands for the number of bits this
message reports
Sum :which is the number of those bits which
are equal to one



Cu is for now one
△u measures the number of access set bits u
has been informed of
△uv measures the number of access set bits u
and v have last reported to one another



△u recalculation: each time Su changes, a
message is received, or a node connects
to v or disconnects from v
△uv recalculation: each time a message is
sent to or received from v
As long as △u≥△uv≥0and
△v≥△vu≥0 ,there is no need to
exchange data
Algorithm 1: LSD-Majority
Generalize LSD-Majority for
frequency counts




Cu: size of the local database
Su: local support of an itemset
λ: MinFreq
Thus the resulting protocol will decide whether
an itemset is frequent or not in




Cu: the number of transactions that
include X in the local database
Su: the number of these transactions
include both X and Y
λ: MinConf
Thus the result will decide whether a rule
X→Y is confident or not.


Deciding whether a rule is correct or false
requires that each node run two instances
of the protocol.
This way LSD-Majority efficiently decides
whether a candidate rule is correct or false.
Majority-Rule


Each node must take into account not only
the local data, but also data brought to it
by LSD-Majority.
An algorithm which never really finishes
discovering all itemsets must generate
rules on the fly.
Majority-Rule
Conclusion




A distributed majority vote protocol- LSDMajority as part of the algorithm
An algorithm – Majority-Rule that mines
association rules on distributed systems of
unlimited size.
Key quality is its locality.
Also fast convergence of the result and
low communication demands