Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Distributed Data Mining System in Java Group Member D91725001 王春笙 D92725002 林俊甫 D92725001 王慧芬 Overview of Project Motivation and goals It is time-consuming to perform multi-layer datamining over a large data file Joint force to improve performance Several computing power spreading over net Fault tolerant consideration The mining process will be continue despite of server crash System Architecture Prediction engine Node Node Node distributed mining Web Server Log files Request service module Htt p Web Client Technological Infrastructure System diagram Client Client Client ... LAN Mining data chunk Server/Coordinator Project Timeline 識別碼任務名稱 1 System Analysis 2 System Design 3 4 ServerCoordinator 5 Server MsgInterf 6 Server FileDispatch 7 Server IntegrationT. 8 Server Comm. 9 Server Comm. Test 10 11 Client Join/Leave 12 Client Rsc lookup 13 Client Get Rsc 14 Client Integration T. 15 Client Comm. 16 Client Comm. Test 17 18 GUI Design 19 GUI Test 20 21 Integration Test 22 System Test 23 Documentation 2003/11 2003/12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 Job Distribution Server programming 林俊甫 Client programming 王春笙 GUI programming and Integration 王慧芬 Technological Infrastructure System design requirements Transparency Scalability Dynamic join problem Multi-Threads RMI Multicast Socket Redundancy Server crash failure Client crash failure Technological Infrastructure Rationale/justification Data-mining is computing intensive task Speed of web log data generation may so quickly as single computer can’t handle it implement distributed prediction engine have fault tolerance advantage Technological Infrastructure Alternatives considered Fully distributed data mining system Each participant act as peer to peer autonomous node Client/server distributed data mining system The data server act as fixed coordinator Implementation Phase System requirement Hardware 2 or 3 PC with Microsoft Windows platform 1 acts as Server, others act as client as well as redundant server. Software For implementation J2SE SDK 1.4.1 Eclipse 2.1 Netbeam 3.5.1 For execution Java web start Implementation Phase Implementation Logic Server/Coordinator Activating at a well-known port, waiting for client connection by threads process. Logging all the connected client information to hash table. Dispatching the designate mining data to clients. Maintain and multicasting the hast table to each client periodically Merging & displaying the results return from clients Detecting the connection status for each client. If a client fail, server performs the backup mechanism and orders backup client to take over failure client’s job. Implementation Phase Implementation Client Once activated, enrolling to server (coordinator) Receiving the hash table broadcasted from server and updating local hash table periodically and the mining data sent from server Perform the data mining execution and return the result to server (Coordinator). Detecting the server connection, if server is not alive, perform the backup mechanism to electing a client acting as backup server. Implementation Phase Failure and backup mechanism Client fail: Server will be informed the connection failure with client . Then, server modifies the connection information in the hash table, finds a client without any designated job in the hash table , and dispatches the unfinished job to the client. Server fail: All clients will be informed the connection failure with server. Since all clients keep all connection information in hash table which is periodically updated from server, after server failed, all clients elect a new server through the same election mechanism. Then, new server broadcasts the result to all clients, and enter server listening state. Implementation Phase Data mining algorithm Using sequential patterns mining algorithm Apriori like Client mining data partition and sent results to coordinator(server) Coordinator receive client mining results,union and validate results by scan all data again Results present as association rules Implementation Phase Installation Server Web log file Server module Client module Client Client module Server module The role of a node in mining process may change Implementation phase Test Component(server, client, UI) unit test System integration test Fault tolerance test Component error Transmission error Network error Host error