Download JESSICA2: A Distributed Java Virtual Machine with Transparent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
JIT-Compiler-Assisted Distributed
Java Virtual Machine
Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau
The Systems Research Group
Department of Computer Science and Information Systems
The University of Hong Kong
Presented by Cho-Li Wang
Outline
Distributed Java Virtual Machine (DJVM)
Design tradeoffs
Related work
JESSICA2 DJVM


JIT-compiler-assisted dynamic thread migration
Global Object Space (GOS) for locationtransparent object access
Experimental results + A demo
Conclusion & future work
TCHPC 2004, Taiwan, Mar, 2004
2
Distributed Java Virtual
Machine (DJVM)
import java.util.*;
class worker extends Thread{
private long n;
public worker(long N){ n=N; }
public void run(){ long sum=0;
for(long i=0; i<n; i++) sum+=i;
System.out.println(“N=“+n+” Sum="+sum);}
}
Java
public class test { static final int N=100;
public static void main(String args[]){
worker [] w= new worker[N];
Random r = new Random();
for (int i=0; i<N; i++)
w[i] = new worker(r.nextLong());
for (int i=0; i<N; i++) w[i].start();
try{ for (int i=0; i<N; i++) w[i].join();}
catch (Exception e){}}
}
A distributed Java Virtual
Machine (DJVM) consists of a
group of extended JVMs running
on a distributed environment to
support true parallel execution of
a multithreaded Java application.
thread
(Single System Image)
A DJVM provides all the JVM
services, that are compliant with
the Java language specification.
Bytecode Execution Engine
DJVM
DJVM provides an illusion that
the program is running on a
single machine (yet more
powerful) -- Single System Image
(SSI)
TCHPC 2004, Taiwan, Mar, 2004
Heap
Thread
JVM
JVM
JVM
Class
JVM
3
Design Tradeoffs of a DJVM
How to manage the threads?


Distributed thread scheduling
Initial thread placement vs migration
How to store the data ?



Thread
Sched
Exec
Engine
Heap
Object store : A global heap shared by threads ?
Memory consistency : Java memory model ?
Can an off-the-shelf DSM be used ? Or others ?
How to process the bytecode ?


Execution Engine : Interpretation, Just-in-Time
(JIT) compilation, static compilation
High performance ?
TCHPC 2004, Taiwan, Mar, 2004
4
Remote
Creation
Related work
Intr
Embedded
OO-based
DSM (Proxy)
cJVM (IBM Haifa Research)


Interpreter mode execution
Embedded OO-based DSM (Proxy)
Manual
Distribution
JAVA/DSM (Rice University)


Interpreter mode execution
Heap built on top of a page-based DSM
JESSICA (HKU)



Thread migration
Interpreter mode execution
Heap built on top of a page-based DSM
Jackal, Hyperion


Static compilation
Link to an object-based DSM
TCHPC 2004, Taiwan, Mar, 2004
Intr
Transparent
Page-based
DSM
Migration
Intr
Page-based
DSM
Remote
Creation
Static OO-based
compilation DSM
5
JESSICA2 (Java-Enabled Single-SystemImage Computing Architecture)
A Multithreaded
Java Program
Thread Migration
JIT Compiler Mode
Portable Java Frame
JESSICA2
JVM
JESSICA2
JVM
Master
JESSICA2
JVM
Worker
JESSICA2
JVM
Worker
JESSICA2
JVM
Worker
JESSICA2
JVM
Worker
Worker
Global Object Space
A shared global heap
spanning all cluster nodes
TCHPC 2004, Taiwan, Mar, 2004
6
JESSICA2 Main Features
Cluster-aware bytecode execution engine (JITEE)


JVM operated in Just-In-Time (JIT) compilation mode
Cluster-aware : global naming scheme for threads, objects,..
JIT-compiler-assisted dynamic thread migration



Runtime capturing and restoring of thread execution context.
No source code modification; no bytecode instrumentation
(preprocessing); no new API introduced
Enable dynamic load balancing
Global Object Space (GOS)





Provide location-transparent object access for threads
Tightly integrated with JVM,
Memory consistency : compliant with Java Memory Model (JMM)
Various optimizing schemes : adaptive migrating home, synchronized
method shipping, object pushing
I/O redirection
TCHPC 2004, Taiwan, Mar, 2004
7
JESSICA2 thread migration
(In a JIT-enabled JVM)
RTC: Raw Thread Context
BTC : Bytecode-oriented Thread Context = thread id + Java frames
(class name, method signature, PC, Operand stack ptr, local vars …)
Thread
Frames
Frames
BTC
RTC
Migration
Manager
Frame
(2)
Thread Scheduler
Stack analysis
Stack capturing
Source node
Load
Monitor
JVM
Method Area
PC
(1) Alert
Frame parsing
(3) Restore execution
Transformation of the RTC into the BTC
directly inside the JIT compiler
TCHPC 2004, Taiwan, Mar, 2004
RTC
Frame
Method Area
PC
Destination node
8
Thread Stack Transformation
Raw Thread Context (RTC)
%esp:
0x00000000
%esp+4: 0x082ca809
%esp+8: 0x08225400
%esp+12: 0x08266bc0
%esp : stack pointer
Stack
Capturing
method id
[ : array; D: double
%esp:
0x00000000
%esp+4: 0x086243c
%esp+8: 0x08623200
%esp+12: 0x08293010
...
%eax = 0x08623200
%ebx = 0x08293010
Frames{
method CPI::run()V@111
local=13;stack=0;
var:
arg0:CPI, 33, 0x8225400
local1: [D; 33, 0x8266bc0@2
local2: int, 2;
node id
...
Stack
Restoration
bytecode Program
Counter
Bytecode-oriented Thread Context (BTC)
TCHPC 2004, Taiwan, Mar, 2004
9
Thread State Capturing : Details
Bytecode verifier
migration points :
(1) head of basic block (loop)
(2) before a method invocation
Construct
control flow
graph
invoke
Bytecode translation
Intermediate
Code
1.
2.
3.
Add migration checking code (cmp mflag,0)
Add object checking (local or remote obj)
Add type and register spilling
code generation
Global Object
Space
Native Code
Linking &
Constant Resolution
Java frame detection
Java frame
C frame
raw stack
TCHPC 2004, Taiwan, Mar, 2004
thread stack
10
Restoring: Dynamic Register Patching
(on i386 Architecture)
Small code stubs
Rebuilt register context
reg1 <- value1
jmp restore_point1
frame 1
%ebp
Compiled methods:
Method1(){
...
retore_point1:
}
Ret addr
Stack
growth
reg1 <- value1
reg2 <- value2
jmp restore_point0
frame 0
%ebp
Ret addr
trampoline frame
Native code
Method0(){
...
retore_point0:
}
trampoline
bootstrap frame
%ebp
%ebp : i386 frame pointer
“Ret Addr”: return address of the current function call
TCHPC 2004, Taiwan, Mar, 2004
bootstrap(){
trampoline();
closing handler();
}
11
Global Object Space (GOS)
Provide global heap abstraction for DJVM
Home-based object coherence protocol,
compliant with JVM Memory Model

OO-based to reduce false sharing
Non-blocking communication

Use threaded I/O interface inside JVM for
communication to hide the latency
Adaptive object home migration mechanism


Take advantage of JVM runtime information for
optimization
Optimizations: Home migration, Synchronized
Method Shipping, Object pushing
TCHPC 2004, Taiwan, Mar, 2004
12
Experimental environment
HKU Gideon 300 Linux cluster : 300 P4 PCs (2GHz, 512 MB RAM, 40 GB disk)
Network: 312-port Foundry FastIron 1500 Non-blocking switch (100 Mbits/s)
Kaffe JVM version 1.0.6; Linux kernel 2.4.18-3 (RedHat 7.3)
TCHPC 2004, Taiwan, Mar, 2004
13
Migration overhead during
normal execution
(SPECJVM98 benchmark)
Benchmarks
Time (seconds)
Space (native code/bytecode)
No migration
Migration
No migration
Migration
compress
11.31
11.39(+0.71%)
6.89
7.58(+10.01%)
jess
30.48
30.96(+1.57%)
6.82
8.34(+22.29%)
raytrace
24.47
24.68(+0.86%)
7.47
8.49(+13.65%)
db
35.49
36.69(+3.38%)
7.01
7.63(+8.84%)
javac
38.66
40.96(+5.95%)
6.74
8.72(+29.38%)
mpegaudio
28.07
29.28(+4.31%)
7.97
8.53(+7.03%)
mtrt
24.91
25.05(+0.56%)
7.47
8.49(+13.65%)
jack
37.78
37.90(+0.32%)
6.95
8.38(+20.58%)
Average
TCHPC 2004, Taiwan, Mar, 2004
(+2.21%)
(+15.68%)
14
Migration overhead analysis
Program (frame #)
LT(1)
CPI(1)
ASP(1)
N-Body(8)
SOR(2)
Latency (ms)
4.997
2.680
4.678
10.803
8.467
Overall migration latency (2-10 ms)
Frame # 1
2
4
6
8
10
15
37
59
81
103
Size (B) 201
417
849
1281
1713
2145
Capture (us) 202
266
410
495
605
730
Parse (us) 235
253
447
526
611
724
Create (us) 360
360
360
360
360
360
Compile (us) 478
575
847
1,169
1,451
1,720
Build (us) 7
11
14
16
21
28
Total (us) 1,282
1,465
2,078
2,566
3,048
3,562
Var # 4
Migration time breakdown (LT program)
TCHPC 2004, Taiwan, Mar, 2004
15
GOS Optimizations
(using 4 PCs)
100%
80%
Obj
60%
Syn
40%
Comp
20%
ASP
NO = No optimizations
H = Home migration
TCHPC 2004, Taiwan, Mar, 2004
SOR
Nbody
HSP
HS
H
NO
HSP
HS
H
NO
HSP
HS
H
NO
HSP
HS
H
NO
0%
TSP
HS = Home migration + Synchronized Method Shipping
HSP = HS + Object pushing
16
Application benchmark
Speedup
10
Linear speedup
Speedup
8
CPI
6
TSP
4
Raytracer
2
nBody
0
2
4
8
Node
number
Number
of Nodes
TCHPC 2004, Taiwan, Mar, 2004
17
JESSICA2 vs JESSICA
(CPI)
Time(ms)
CPI(50,000,000iterations)
250000
200000
150000
100000
50000
0
JESSICA
JESSICA2
2
4
8
Number of nodes
TCHPC 2004, Taiwan, Mar, 2004
18
Parallel Ray Tracing
(using 64
nodes of Gideon 300 cluster)
Linux 2.4.18-3 kernel (Redhat 7.3)
64 nodes: 108 seconds
1 node: 4402 seconds ( 1.2 hour)
Speedup = 4402/108=40.75
TCHPC 2004, Taiwan, Mar, 2004
19
Demo
Execution Steps
1.
2.
3.
4.
Create the display panel
Start the ray tracing program on node 26
with 8 threads
Add two more nodes: 27 and 28
Add 5 more nodes: 29, 30, 31, 32, 33
TCHPC 2004, Taiwan, Mar, 2004
20
Conclusions
Dynamic Java thread migration makes it
possible for true parallel execution of Java
threads and enables dynamic load balancing.
Runtime (“Just-In-Time”) code Instrument for
thread state capturing and restoring is feasible.
An embedded GOS layer can take advantage
of the JVM runtime information to reduce
communication overhead
TCHPC 2004, Taiwan, Mar, 2004
21
Advantages of native code
instrumentation
Lightweight
Re-use JIT compiler internal data structures
and control flow analysis functions
 Instrumented native codes are more efficient
than instrumented bytecode.

Transparent
No source code modification.
 No new API introduced.
 No preprocessing

TCHPC 2004, Taiwan, Mar, 2004
22
Future work
Advanced thread migration mechanism
without overhead during normal
execution
Incremental Distributed GC
Enhanced Single I/O Space to benefit
more real-life applications
Parallel I/O Support
TCHPC 2004, Taiwan, Mar, 2004
23
Thanks
JESSICA2 Webpage
http://www.csis.hku.hk/~clwang/
projects/JESSICA2.html
TCHPC 2004, Taiwan, Mar, 2004
24
Related documents