Download who is server?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Distributed Data
Mining System in
Java
Group Member
王春笙,林俊甫,王慧芬
Overview of Project
• Project participants
– 王春笙,林俊甫,王慧芬
Project Programming Tasks
• D92725002 林俊甫
–
–
–
–
–
–
Polling and reply Multicast between client and server
Client/Server Socket programming
Client dynamic join and leave mechanism
Multi-thread programming
Synchronization mechanism
Data chunks maintenance and dispatching
mechanism
– Client/Server communication link control
Project Programming
Tasks(cont’d)
– Client failure handling
• Reassign backup server, if failure client is backup
• Restore failure client works (with 王春笙)
– Server failure handling
• Backup Server designate mechanism and logic design
– RMI mechanism (with 王春笙)
– Basic GUI
System Infrastructure
• System diagram
Client
Client
Client
...
LAN
Mining data chunk
Mining result
Server/Coordinator
Basic Operation
Time
Time
Server
Listen multicast
Group query and
reply
Fork thread to
Handle client
connection
1. Polling on port 4444 Group 230.0.0.1
@: who is server?
2. Servername: I am the server
Server found;
Connect to the
Server
3. Connect to <servername, port 4445>
4. Client do: filechunk#
Wait for client’s
Processed result,
Order client to get
Another file chunk
Client
5. ok
6. Client do: next filechunk#
7…..
8…..
….
Receive server’s
Instruction, ivoke
RMI to get file
chunk
Port Assignment
• Port 4444: for multicast
• Port 4445: for TCP/IP socket connection
• Port 4446: for RMI services
Finding A Server
• Once a client start up, it
will query periodically
1. Client Query: who
2. Listen for
every 3 sec. over the
is the Server now?
server response
multicast group 230.0.0.1
port 4444 by sending 1
byte string “@” to locating
6. Server failure
the server host.
3.Connect to
detect -> if I am backup
Server on port
go to backup server
• Once a server start up, it procedure,
4445
otherwise
go to step.1.
will fork a thread to
4. Use RMI Get file
chunk from
dealing with the query
Server
5. Process data mining
and return
result to server
File Dispatching
• Server maintain a file chunk pool .
FileChunks
…………
-1: empty, 0: available, 1: using, 2:used
• Server will find a available file chunk for client, set it to 1
and order client to get this file chunk by RMI file chunk
will be update to 2 when client return result.
• Recovery: When server detects client’s link-broken, it will
restore file chunk allocate to client to 0.
• File chunk class is declared as Serializable for RMI
message passing to backup server
• File chunk class use Synchronization for concurrent
control
Backup Server Selection
• Server maintains and assigns unique id for
each individual client.
• Unique id is incremented as serial number.
• Client with smallest id is assigned as
backup server
• When client failure, server will check if it is
the backup server to restart the selection
process or not.
Nodes Maintenance
• Server maintain connected client’s records in an
ArrayList
• ArrayList is compound with class Nodes, which records
client’s detail information.
ArrayList: ht
Key
Nodes
Value
Id
Address
Port
Work on
Status
RMI Services
• RMI services is written in independent
program because server and client (which
acts as backup server) will use it.
• RMI services provides:
– Backup server data to backup-server.
– Get file chunk from server
– Return mining result to server
– Receive nodes information from server
Client Failure
• Server’s action took:
– Recovery
– Reassignment
– Redo backup server selection if failure nodes
is backup
• Client’s action
– Do nothing except one is told by server to act
as backup
Server Failure
Time Server S
Client A Time
1.A is told by S that
It is the backup
A invoke RMI to
get all Server data
A: Do backup
RMI Get file
Server run backup
Selection choose A
As backup
RMI reply
Client do #
2. A periodically
Get server services,
File chunk data
Client do #
do reply
3. Comm.link broken
Is detected, start
ServerAction class
X
4. Create server
Socket at 4445,
fork thread
To listen to query
And wait for
connection
do reply
Server Crash
X
Time Client B
1. B receives
instruction as
discuss before
2. Comm.Link
Broken is
detected,
multicast query
who is the server
now?
B Polling @: who is server?
A reply: I am the server
Connect to A:4445
3. B know A is
the backup, reconnect to A
Server/Client Life Cycle
Server
Client
evolve
Normal/Abnormal
Termination
Server
Normal/Abnormal
Termination
Project Programming Tasks
• D91725001 王春笙
– Web log file preprocessing and separating
– Web pages traversal sequences parsing
– Page items transferring and mapping
– Web pages sequential patterns mining
– Mining results maintenance
– RMI mining results transfer
– Mining results lookup and display
Project Programming
Tasks(cont’d)
– Backup mechanism
• Separate thread backup server files and memory data
• Restore failure client works (with 林俊甫)
– RMI mechanism (with 林俊甫)
– GUI global states refreshment
– System integration
• Testing and debugging
Web Log File Format
•
•
•
•
User IP
Date
Time
Web pages URL
Web File Preprocessing
•
•
•
•
Select *.htm and *.html pages
First sort by user ID
Second sort by time
Pages sequences separated by time
– more than 30 seconds
Chunk Data Files
• Part*.ppp
6023 2 1 1 2 8
6024 1 1 206
6025 7 1 1 1 1 1 1 1 2 5 17 18 19 20 11
6026 3 1 1 1 144 145 338
6027 2 1 1 2 9
6028 3 1 1 1 2 8 3
• Items.ppp
/~visualdep/htm/p5b.htm 168
/~businessdep/student/picture.html 169
/~comedu/inde.htm 170
/~account/91tuition.htm 171
/~stuaffair/life/procedure-17.htm 172
/~stuaffair/life/procedure-25.htm 173
Apriori algorithm
•
•
•
•
•
•
•
1:find all L1
2:generate C2 from L1
3:count C2 and find all L2
4:k=3
5:generate & prune Ck from Lk-1
6:count Ck and find all Lk
7:if Lk not empty then k++, goto 5
Apriori algorithm (cont’d)
• join phase:s1 join s2 if s1(drop first) =
s2(drop last)
s1  {a, b}, s2  {b, a}
– s1 join s2 => {a, b, a}
• prune phase:delete a k candidate if any k1 sub sequence not large
• C & L are stored in hash data structure
Mining Result Display
• Client frequent patterns
– Web page ID
– Support
– Saved as *.pppl files
• Client frequent patterns
– Web page ID
– Support
– Web page name
Backup Mechanism
• When backup server selected, that client
start a backup thread
• Backup thread loop every 0.5 second
• RMI data transfer
– Chunk data file(part*.ppp,items.ppp)
– Client information
– File chunk information
• determine MaxID and set “in use” to “available”
– Frequent patterns information
System Integration
• Java class integration
– Server component
– Client component
– Data mining component
– GUI component
• Testing
• Debugging
Project Programming Tasks
• D92725001 王慧芬
– Graphical User Interface
• Since this is a system working on data mining task
in a distributed way, its GUI provides four panels:
–
–
–
–
A system console
A result window
A connection table
A graphical network configuration
GUI
• The system console shows how system
proceeds
GUI (cont’d)
• The result window displays the progress
and results of data mining
GUI (cont’d)
• A connection table lists all of the on-line
client connection information
GUI (cont’d)
• A connection table consists of 5 fields
– NO:client-server connection id
– IP address:client’s IP address
– Port:client’s port number
– Status:connection status, it could be
•
•
•
•
•
0: offline
1: online
2: file transfer from server to client
3: client is doing data mining
4: client returns value back to server if data mining finished
5: client is doing the backup and data mining at the same time
– # chunk works on:if data mining and backup, it
indicates the chuck number that the connection
works on
GUI (cont’d)
• A graphical network configuration follows the
connection table to depict the dynamic
network configuration
GUI (cont’d)
• In the dynamic network configuration, we use
different client GIFs to express the status:
– Offline
– Data mining
– Backup and mining
On-line
GUI interface
• mw.showMsg()
– provided by GUI for server/client module to show the
console message
• mw.showResultString()
– provided by GUI for server/client module to show the
results of data mining
• Connection table
– modified by server/client module for connection
information
– read by GUI every 0.01 second to depict the dynamic
network configuration
GUI design
• Java swing is used to generate label, text,
scrollbar, and table, etc..
• Java AWT 2D painting is used to form the
animation of the connection lines in the
dynamic configuration panel
• ‘Photo Impact’ and ‘GIF animator’ are used
to generate the node icons
• EasyRGB used to tune the color
harmonies.
GUI design (cont’d)
• A new thread is forked from the GUI task to work on the
animation of the connection lines in the dynamic
configuration panel,
GUI
– to read the table
every 0.03 second and
to show the connection
status with a moving
ball.
Generate
system
console
Generate
result panel
Generate
connection
table
Generate
connection
table
animation
Installation
• 以執行一個 server,兩個client為例
– 建立三個資料夾,此三資料夾Ser(Server),Cli(Client1),Cli2(Client
2)
– 將附檔解壓至Ser資料夾,此資料夾內要下載weblog10.zip檔,並
解壓
– 將附檔解壓至 Cli 與Cli2的空資料夾
– 開啟二個dos視窗(1,2號視窗),進入Ser資料夾
– 開啟三個dos視窗(3,4,5號視窗),3,4號進入Cli資料夾,5號進入
Cli2資料夾
– 1號視窗執行 compile.bat 批次檔,再執行 rmi.bat
– 2號視窗執行 server.bat 批次檔
– 3號視窗執行 compile.bat 批次檔,再執行 rmi.bat
– 4號視窗執行 client.bat批次檔
– 5號視窗執行 compile.bat批次檔,再執行 client.bat批次檔