Download Telegraph Java Experiences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Telegraph Java
Experiences
Sam Madden
UC Berkeley
[email protected]
2/14/01
RightOrder : Telegraph & Java
1
Telegraph Overview



100% Java
In memory database
Query engine for alternative sources



Web
Sensors
Testbed for adaptive query processing
2/14/01
RightOrder : Telegraph & Java
2
Telegraph & WWW : FFF
Federated Facts and Figures
 Collect Data on the Election
 Based on Avnur and Hellerstein
Sigmod ‘00 Work: Eddies


Route tuples dynamically based on
source loads and selectivities
2/14/01
RightOrder : Telegraph & Java
3
fff.cs.berkeley.edu
2/14/01
RightOrder : Telegraph & Java
4
Architecture Overview

Query Parser


Preoptimizer


Jlex & CUP
Chooses Access Paths
Eddy

Routes Tuples To Modules
2/14/01
RightOrder : Telegraph & Java
5
Modules
Doubly-Pipelined Hash Joins
 Index Joins


For probing into web-pages
Aggregates & Group Bys
 Scans


Telegraph Screen Scraper: View
web pages as Relations
2/14/01
RightOrder : Telegraph & Java
6
Execution Framework


One Thread Per Query
Iterator Model for Queries



Experimented with Thread Per Module
Linux threads are expensive
Two Memory Management Models


Java Objects
Home Rolled Byte Arrays
2/14/01
RightOrder : Telegraph & Java
7
Tuples as Java Objects




Tuple Data stored as a Java Object
Each in separate byte array
Tuples copied on joins, aggregates
Issues



Memory Management between Modules,
Queries, Garbage collector control
Allocation Overhead
Performance: 30,000 200byte tuples /
sec -> 5.9 MB / sec
2/14/01
RightOrder : Telegraph & Java
8
Tuples As Byte Array


All tuples stored in same byte array /
query
Surrogate Java Objects Byte Array
Offset, Size
Offset, Size
Offset, Size
Surrogate Objects
Directory
2/14/01
RightOrder : Telegraph & Java
9
Byte Array (cont)
Allows explicit control over
memory / query (or module)
 Compaction eliminates garbage
collection randomness
 Lower throughput: 15,000 t/sec

No surrogate object reuse
 Synchronization costs

2/14/01
RightOrder : Telegraph & Java
10
Other System Pieces

XML Based Catalog

Java Introspection Helps
Applet-based Front End
 JDBC Interface
 Fault Tolerance / Multiple Servers


Via simple UNIX tools
2/14/01
RightOrder : Telegraph & Java
11
RightOrder Questions
Performance vs. C
 JNI Issues
 Garbage Collection Issues
 Serialization Costs
 Lots of Java Objects
 JDBC vs ODI

2/14/01
RightOrder : Telegraph & Java
12
Performance Vs. C




JVM + JIT Performance Encouraging:
IBM JIT == 60% of Intel C compiler,
faster than MSC for low level
benchmarks
IBM JIT 2x Faster than HotSpot for
Telegraph Scans
Stability Issues
www.javalobby.org/features/jpr
2/14/01
RightOrder : Telegraph & Java
13
JIT Performance vs C
Optimized Intel
Optimized MS
IBM JIT
Source: www.javalobby.org/features/jpr
2/14/01
RightOrder : Telegraph & Java
14
Performance Gotchas

Synchronization


~2x Function Call overhead in HotSpot
Used in Libraries: Vector, StringBuffer
• String allocation single most intensive operation
in Telegraph
• Mercatur: 20% initial CPU Cost

Garbage Collection



Java dumb about reuse
Mercatur: 15% Cost
OceanStore: 30ms avg latency, 1S peak
2/14/01
RightOrder : Telegraph & Java
15
More Gotchas

Finalization


Finalizing methods allows inlining
Serialization
RMI, JNI use serialization
 Philippsen & Haumacher Show
Performance Slowness

2/14/01
RightOrder : Telegraph & Java
16
Performance Tools

Tools to address some issues

JAX, Jopt: make bytecode smaller, faster
• www.alphaworks.ibm.com/tech/JAX

www.condensity.com
• Bytecode optimizer

www.optimizeit.com
• Good profiler, memory allocation and garbage
collection monitor
2/14/01
RightOrder : Telegraph & Java
17
JNI Issues


Not a part of Telegraph
JNI overhead quite large (JDK
1.1.8, PII 300 MHz)
Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis,
UC Berkeley, 1999.
2/14/01
RightOrder : Telegraph & Java
18
More JNI

But, this is being worked on


JNI allows synchronization (pin /
unpin), thread management


IBM JDK 100,000 B copy in 5ms, vs 23ms
for 1.1.8 (500 Mhz PIII)
See
http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html
GCJ + CNI: access Java objects via
C++ classes

http://gcc.gnu.org/java/
2/14/01
RightOrder : Telegraph & Java
19
Garbage Collection

Performance



Big problem: 1 S or longer to GC lots of objects
Most Java GCs blocking (not concurrent or multithreaded)
Unexpected Latencies


OceanStore: Network File Server, 30ms avg.
latencies for network updates, 1000 ms peak due
to GC
In high-concurrency apps, such delays disastrous
2/14/01
RightOrder : Telegraph & Java
20
Garbage Collection Cont.

Limited Control




Runtime.gc() only a hint
Runtime.freeMemory() unreliable
No way to disable
No object reuse

Lots of unnecessary memory allocations
2/14/01
RightOrder : Telegraph & Java
21
Serialization


Not in Telegraph
Philippsen and Haumacher, “More Efficient Object Serialization.”
International Workshop on Java for Parallel and Distributed
Computing. San Juan, April, 1999.



Sun Serialization provides versioning



Serialization costs for RMI are 50% of total RMI time
Discard longevity for 7x speed up
Complete class description stored with each serialized
object
Most standard classes forward compatible (JDK docs
note special cases)
See
http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html
2/14/01
RightOrder : Telegraph & Java
22
Lots of Objects


GC Issues Serious
Memory Management



GC makes programmers allocate willy-nilly
Hard to partition memory space
Telegraph byte-array ugliness due to
inability to limit usage of concurrent
modules, queries
2/14/01
RightOrder : Telegraph & Java
23
Storage Overheads

Java Object class is big:

Integer requires 23 bytes in JDK 1.3
int requires 4.3 bytes
 No way to circumvent object
fields
 Use primitives or hand-written
serialization whenever possible

2/14/01
RightOrder : Telegraph & Java
24
JDBC vs ODI
No experience with Oracle
 JDBC overheads are high, but
don’t have specific performance
numbers

2/14/01
RightOrder : Telegraph & Java
25
Bottom Line

Java great for many reasons



Java performance isn’t bad



GC, standard libraries, type safety, introspection,
etc.
Significant reductions in development and
debugging time.
Especially with some tuning
Memory Management an Issue
Lack of control over JVMs bad

When to garbage collect, how to serialize, etc.
2/14/01
RightOrder : Telegraph & Java
26