Download Distributed Data Mining System with Java

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Distributed Data Mining
System in Java
Group Member
D91725001 王春笙
D92725002 林俊甫
D92725001 王慧芬
Overview of Project
 Motivation and goals


It is time-consuming to perform multi-layer datamining over a large data file
Joint force to improve performance


Several computing power spreading over net
Fault tolerant consideration

The mining process will be continue despite of
server crash
System Architecture
Prediction engine
Node
Node
Node
distributed
mining
Web Server
Log files
Request service
module
Htt
p
Web
Client
Technological Infrastructure
 System diagram
Client
Client
Client
...
LAN
Mining data chunk
Server/Coordinator
Project Timeline
識別碼任務名稱
1 System Analysis
2
System Design
3
4
ServerCoordinator
5
Server MsgInterf
6
Server FileDispatch
7
Server IntegrationT.
8
Server Comm.
9
Server Comm. Test
10
11
Client Join/Leave
12
Client Rsc lookup
13
Client Get Rsc
14
Client Integration T.
15
Client Comm.
16
Client Comm. Test
17
18
GUI Design
19
GUI Test
20
21
Integration Test
22
System Test
23
Documentation
2003/11
2003/12
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9
Job Distribution
 Server programming

林俊甫
 Client programming

王春笙
 GUI programming and Integration

王慧芬
Technological Infrastructure
 System design requirements


Transparency
Scalability






Dynamic join problem
Multi-Threads
RMI
Multicast
Socket
Redundancy


Server crash failure
Client crash failure
Technological Infrastructure
 Rationale/justification
 Data-mining is computing intensive task
 Speed of web log data generation may
so quickly as single computer can’t
handle it
 implement distributed prediction engine
have fault tolerance advantage
Technological Infrastructure
 Alternatives considered
 Fully distributed data mining system


Each participant act as peer to peer autonomous node
Client/server distributed data mining system

The data server act as fixed coordinator
Implementation Phase
 System requirement

Hardware

2 or 3 PC with Microsoft Windows platform
 1 acts as Server, others act as client as well as
redundant server.

Software

For implementation
 J2SE SDK 1.4.1
 Eclipse 2.1
 Netbeam 3.5.1

For execution
 Java web start
Implementation Phase
 Implementation Logic
 Server/Coordinator




Activating at a well-known port, waiting for client
connection by threads process. Logging all the
connected client information to hash table. Dispatching
the designate mining data to clients.
Maintain and multicasting the hast table to each client
periodically
Merging & displaying the results return from clients
Detecting the connection status for each client. If a client
fail, server performs the backup mechanism and orders
backup client to take over failure client’s job.
Implementation Phase
 Implementation
 Client




Once activated, enrolling to server (coordinator)
Receiving the hash table broadcasted from server and
updating local hash table periodically and the mining
data sent from server
Perform the data mining execution and return the result
to server (Coordinator).
Detecting the server connection, if server is not alive,
perform the backup mechanism to electing a client acting
as backup server.
Implementation Phase
 Failure and backup mechanism

Client fail:



Server will be informed the connection failure with client .
Then, server modifies the connection information in the hash
table, finds a client without any designated job in the hash
table , and dispatches the unfinished job to the client.
Server fail:



All clients will be informed the connection failure with server.
Since all clients keep all connection information in hash table
which is periodically updated from server, after server failed, all
clients elect a new server through the same election
mechanism.
Then, new server broadcasts the result to all clients, and enter
server listening state.
Implementation Phase
 Data mining algorithm
 Using sequential patterns mining
algorithm
 Apriori like
 Client mining data partition and sent
results to coordinator(server)
 Coordinator receive client mining
results,union and validate results by
scan all data again
 Results present as association rules
Implementation Phase
 Installation

Server




Web log file
Server module
Client module
Client


Client module
Server module
 The role of a node in mining process may
change
Implementation phase
 Test
 Component(server, client, UI) unit test
 System integration test
 Fault tolerance test
 Component error
 Transmission error
 Network error
 Host error
Related documents