Download JESSICA2: A Distributed Java Virtual Machine with Transparent

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
JIT-Compiler-Assisted Distributed
Java Virtual Machine
Wenzhang Zhu, Cho-Li Wang, Weijian Fang and Francis C. M. Lau
Department of Computer Science and Information Systems
The University of Hong Kong
Presented by Cho-Li Wang
Outline
Distributed Java Virtual Machine
Design Tradeoffs
Related work
JESSICA2 features
Experimental results
Conclusion & future work
A raytracing demo
TCHPC 2004, Taiwan, Mar, 2004
2
Distributed Java Virtual
Machine (DJVM)
import java.util.*;
class worker extends Thread{
private long n;
public worker(long N){ n=N; }
public void run(){ long sum=0;
for(long i=0; i<n; i++) sum+=i;
System.out.println(“N=“+n+” Sum="+sum);}
}
Java
public class test { static final int N=100;
public static void main(String args[]){
worker [] w= new worker[N];
Random r = new Random();
for (int i=0; i<N; i++)
w[i] = new worker(r.nextLong());
for (int i=0; i<N; i++) w[i].start();
try{ for (int i=0; i<N; i++) w[i].join();}
catch (Exception e){}}
}
A distributed Java Virtual
Machine (DJVM) consists of a
group of extended JVMs running
on a distributed environment to
support true parallel execution of
a multithreaded Java application.
A DJVM provides all the JVM
services, that are compliant with
the Java language specification,
as if running on a single machine
– Single System Image (SSI).
TCHPC 2004, Taiwan, Mar, 2004
thread
(Single System Image)
Bytecode Execution Engine
DJVM
Heap
Thread
JVM
JVM
JVM
Class
JVM
3
Design Tradeoffs of a DJVM
How to manage the threads?


Distributed thread scheduling
Initial placement vs thread migration
How to store the data ?



Thread
Sched
Exec
Engine
Heap
Distributed heap (object store)
Java memory model (memory consistency)
Can an off-the-shelf DSM be used as the heap?
How to process the bytecode ?

Execution Engine : Interpretation, Just-in-Time
(JIT) compilation, Static compilation
TCHPC 2004, Taiwan, Mar, 2004
4
Remote
Creation
Related work
Intr
Embedded
OO-based
DSM (Proxy)
cJVM (IBM Haifa Research)


Interpreter mode execution
built-in object caching
Manual
Distribution
JAVA/DSM (Rice University)


Interpreter mode execution
Heap built on top of a page-based DSM
Intr
Page-based
DSM
Transparent
Migration
JESSICA(HKU)



Thread migration
Interpreter mode execution
Heap built on top of a page-based DSM
Intr
Page-based
DSM
Remote
Creation
Jackal, Hyperion


Static compilation
Link to object-based DSM
TCHPC 2004, Taiwan, Mar, 2004
Static OO-based
compilation DSM
5
JESSICA2 (Java-Enabled Single-SystemImage Computing Architecture)
A Multithreaded
Java Program
Thread Migration
JIT Compiler Mode
Portable Java Frame
JESSICA2
JVM
JESSICA2
JVM
Master
JESSICA2
JVM
Worker
JESSICA2
JVM
Worker
JESSICA2
JVM
Worker
JESSICA2
JVM
Worker
Worker
Global Object Space
TCHPC 2004, Taiwan, Mar, 2004
6
JESSICA2 Main Features
Transparent Java thread migration



Runtime capturing and restoring of thread execution context.
No source code modification; no bytecode instrumentation
(preprocessing); no new API introduced
Enable dynamic load balancing on clusters
JIT compiler-based execution engine (JITEE)


Operated in Just-In-Time (JIT) compilation mode
cluster-aware
Global Object Space




A shared global heap spanning all cluster nodes
Provide location-transparent object access
Adaptive migrating home protocol for memory consistency,
plus various optimizing schemes.
I/O redirection
TCHPC 2004, Taiwan, Mar, 2004
7
JESSICA2 thread migration
(In a JIT-enabled JVM)
RTC: Raw Thread Context
BTC : Bytecode-oriented Thread Context (thread id,
frames, class names, method signature, PC, Operand
stack ptr, local vars …)
Thread
Frames
Frames
BTC
RTC
Migration
Manager
Frame
(2)
Thread Scheduler
Stack analysis
Stack capturing
Source node
Load
Monitor
JVM
Method Area
PC
(1) Alert
Frame parsing
(3) Restore execution
Transformation of the RTC into the BTC
directly inside the JIT compiler
TCHPC 2004, Taiwan, Mar, 2004
RTC
Frame
Method Area
PC
Destination node
8
Thread Stack Transformation
Raw Thread Context (RTC)
Raw Thread Context (RTC)
%esp:
0x00000000
%esp+4: 0x082ca809
%esp+8: 0x08225400
%esp+12: 0x08266bc0
%esp:
0x00000000
%esp+4: 0x082ca809
%esp+8: 0x08225400
%esp+12: 0x08266bc0
...
%eax = 0x08623200
%ebx = 0x08293100
Stack Restoration
Stack Capturing
Frames{
method CPI::run()V@111
local=13;stack=0;
var:
arg0:CPI, 33, 0x8225400
local1: [D; 33, 0x8266bc0@2
local2: int, 2;
...
Bytecode-oriented Thread Context (BTC)
TCHPC 2004, Taiwan, Mar, 2004
9
Details
Bytecode verifier
Construct
control flow
graph
Variables
(Restore)
Register
allocation
invoke
head of a basic block
INVOKESTATIC,
INVOKESPECIAL,
INVOKEVIRTUAL and
INVOKEINTERFACE
Bytecode translation
1.
2.
3.
4.
Intermediate
Code
Register rebuild
mov var1->reg1
mov var2->reg2
...
Migration checking
Non-destructive register spilling
Object checking
Type spilling for variable type
deducing
code generation
Global Object
Space
Native Code
Linking &
Constant Resolution
reg
migration point
selection :
var
Java frame detection
thread stack
Java frame
raw stack
TCHPC 2004, Taiwan, Mar, 2004
C frame
10
Example of native code
instrumentation
TCHPC 2004, Taiwan, Mar, 2004
11
Optimization on migration
points – Pseudo-inlining
Purpose : eliminate the costs of unnecessary
inserted migration points
General idea: delete M-points before a small
method invocation
TCHPC 2004, Taiwan, Mar, 2004
12
Dynamic Register Patching
reg1 <- value1
jmp restore_point1
frame 1
%ebp
Compiled
methods:
Method1(){
...
retore_point1:
}
Ret addr
Stack
growth
reg1 <- value1
reg2 <- value2
jmp restore_point0
frame 0
%ebp
Ret addr
trampoline frame
Method0(){
...
retore_point10:
}
trampoline
bootstrap frame
%ebp
TCHPC 2004, Taiwan, Mar, 2004
bootstrap(){
trampoline();
closing handler();
}
13
Advantages of native code
instrumentation
Lightweight



Re-use JIT compiler internal data structures and control
flow analysis functions
No need to include debugging information in Java class
files
Instrumented native codes are more efficient than
instrumented bytecode.
Transparent



No source code modification.
No new API introduced.
No preprocessing
TCHPC 2004, Taiwan, Mar, 2004
14
Global Object Space (GOS)
Provide global heap abstraction for DJVM
Home-based object coherence protocol,
compliant with JVM Memory Model

OO-based to reduce false sharing
Non-blocking communication

Use threaded I/O interface inside JVM for
communication to hide the latency
Adaptive object home migration mechanism

Take advantage of JVM runtime information for
optimization
TCHPC 2004, Taiwan, Mar, 2004
15
GOS runtime data structure
Master object
Cache object
object header
object header
cache pointer
cache pointer
object data
cache data
Cache header
Master host id
master address
class
cache obj list
Cache data
cache data
TCHPC 2004, Taiwan, Mar, 2004
thread id
status
cache data
next
thread id
status
cache data
next
16
Experimental environment
HKU Gideon 300 Linux cluster : 300 P4 PCs (2GHz, 512 MB RAM, 40 GB disk)
Network: 312-port Foundry FastIron 1500 Non-blocking switch (100 Mbits/s)
TCHPC 2004, Taiwan, Mar, 2004
17
Migration overhead during
normal execution
(SPECJVM98 benchmark)
Benchmarks
Time (seconds)
Space (native code/bytecode)
No migration
Migration
No migration
Migration
compress
11.31
11.39(+0.71%)
6.89
7.58(+10.01%)
jess
30.48
30.96(+1.57%)
6.82
8.34(+22.29%)
raytrace
24.47
24.68(+0.86%)
7.47
8.49(+13.65%)
db
35.49
36.69(+3.38%)
7.01
7.63(+8.84%)
javac
38.66
40.96(+5.95%)
6.74
8.72(+29.38%)
mpegaudio
28.07
29.28(+4.31%)
7.97
8.53(+7.03%)
mtrt
24.91
25.05(+0.56%)
7.47
8.49(+13.65%)
jack
37.78
37.90(+0.32%)
6.95
8.38(+20.58%)
Average
TCHPC 2004, Taiwan, Mar, 2004
(+2.21%)
(+15.68%)
18
Migration overhead analysis
Program (frame #)
LT(1)
CPI(1)
ASP(1)
N-Body(8)
SOR(2)
Latency (ms)
4.997
2.680
4.678
10.803
8.467
Overall migration latency
Frame # 1
2
4
6
8
10
15
37
59
81
103
Size (B) 201
417
849
1281
1713
2145
Capture (us) 202
266
410
495
605
730
Parse (us) 235
253
447
526
611
724
Create (us) 360
360
360
360
360
360
Compile (us) 478
575
847
1,169
1,451
1,720
Build (us) 7
11
14
16
21
28
Total (us) 1,282
1,465
2,078
2,566
3,048
3,562
Var # 4
Migration time breakdown (LT program)
TCHPC 2004, Taiwan, Mar, 2004
19
GOS Optimizations
(using 4 PCs)
100%
80%
Obj
60%
Syn
40%
Comp
20%
ASP
NO = No optimizations
H = Home migration
TCHPC 2004, Taiwan, Mar, 2004
SOR
Nbody
HSP
HS
H
NO
HSP
HS
H
NO
HSP
HS
H
NO
HSP
HS
H
NO
0%
TSP
HS = Home migration + Synchronized Method Shipping
HSP = HS + Object pushing
20
JESSICA2 vs JESSICA
(CPI)
Time(ms)
CPI(50,000,000iterations)
250000
200000
150000
100000
50000
0
JESSICA
JESSICA2
2
4
8
Number of nodes
TCHPC 2004, Taiwan, Mar, 2004
21
Application benchmark
Speedup
10
Linear speedup
Speedup
8
CPI
6
TSP
4
Raytracer
2
nBody
0
2
4
8
Node number
TCHPC 2004, Taiwan, Mar, 2004
22
Parallel Ray Tracing
(using 64
nodes of Gideon 300 cluster)
Linux 2.4.18-3 kernel (Redhat 7.3)
64 nodes: 108 seconds
1 node: 4402 seconds ( 1.2 hour)
Speedup = 4402/108=40.75
TCHPC 2004, Taiwan, Mar, 2004
23
Conclusions
Transparent Java thread migration in
JIT compiler enables the highperformance execution of multithreaded
Java application on clusters
An embedded GOS layer can take
advantage of the JVM runtime
information to reduce communication
overhead
TCHPC 2004, Taiwan, Mar, 2004
24
Future work
Advanced thread migration mechanism
without overhead during normal
execution (finished)
Incremental Distributed GC
Enhanced Single I/O Space to benefit
more real-life applications
Parallel I/O Support
TCHPC 2004, Taiwan, Mar, 2004
25
Thanks
JESSICA2 Webpage
http://www.csis.hku.hk/~clwang/
projects/JESSICA2.html
TCHPC 2004, Taiwan, Mar, 2004
26
Related documents