Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Java Garbage Collection, Byte Code James Atlas August 7, 2008 Review • Java Security JVM Cryptography August 7, 2008 James Atlas - CISC370 2 Schedule • Today Java Garbage Collection Java Bytecode • Tuesday Review • Thursday Final (5-7PM) August 7, 2008 James Atlas - CISC370 3 Java Garbage Collection August 7, 2008 James Atlas - CISC370 4 Garbage Collection • What is garbage and how can we deal with it? • Garbage collection schemes Reference Counting Mark and Sweep Stop and Copy • A comparison August 7, 2008 James Atlas - CISC370 5 How Objects are Created in Java • An object is created in Java by invoking the new() operator. • Calling the new() operator, the JVM will do the following: allocate memory; assign fields their default values; run the constructor; a reference is returned. August 7, 2008 James Atlas - CISC370 6 How Java Reclaims Objects Memory • Java does not provide the programmer any means to destroy objects explicitly • The advantages are No dangling reference problem in Java Easier programming No memory leak problem August 7, 2008 James Atlas - CISC370 7 What is Garbage? • Garbage: unreferenced objects john Object Student john= new Student(); Student jeff= new Student(); john=jeff; • john Object becomes garbage, john jeff because it is an unreferenced Object jeff Object August 7, 2008 James Atlas - CISC370 8 What is Garbage Collection? • What is Garbage Collection? • Finding garbage and reclaiming memory allocated to it. john Object • When is the Garbage Collection process invoked? When the total memory allocated to a Java program exceeds some threshold. john jeff • Is a running program affected by garbage collection? Yes, depending on the algorithm. Typically the program suspends during garbage collection. August 7, 2008 James Atlas - CISC370 jeff Object 9 Strategies for Handling Garbage • Modern Societies produce an excessive amount of waste? • What is the solution? Reduce Reuse Recycle • The same Applies to Java!!! August 7, 2008 James Atlas - CISC370 10 Reduce Garbage • A Java program that does not create any objects does not create garbage. • Objects used until the end of the program do not become garbage. • Reducing the number of objects used will reduce the amount of garbage generated. August 7, 2008 James Atlas - CISC370 11 Reuse Garbage • Reuse objects instead of generating new ones. for (int i=0;i<1000000; ++i) { SomeClass obj= new SomeClass(i); System.out.println(obj); } • This program generates one million objects and prints them out. SomeClass obj= new SomeClass(); for (int i=0;i< 1000000; ++i) { obj.setInt(i); System.out.println(onj); } • Using only one object and implementing the setInt() method, we dramatically reduce the garbage generated. August 7, 2008 James Atlas - CISC370 12 Recycle Garbage • Don't leave unused objects for the garbage collector. Put them instead in a container to be searched when an object is needed. • Advantage: reduces garbage generation. • Disadvantage: puts more overhead on the programmer. • Can anyone think of what design pattern this represents? August 7, 2008 James Atlas - CISC370 13 Garbage Collection Strategies • Reference Counting • Mark and Sweep • Stop and Copy August 7, 2008 James Atlas - CISC370 14 Reference Counting Garbage Collection • Main Idea: Add a reference count field for every object. • This Field is updated when the number of references to an object changes. • Example • Object p= new Integer(57); • Object q = p; August 7, 2008 p 57 refCount = 2 q James Atlas - CISC370 15 Reference Counting (cont'd) • The update of reference field when we have a reference assignment ( i.e p=q) can be implemented as follows if (p!=q) { if (p!=null) --p.refCount; p=q; if (p!=null) ++p.refCount; } Example: Object p = new Integer(57); Object q= new Integer(99); p=q p 57 refCount = 0 q 99 August 7, 2008 James Atlas - CISC370 refCount = 2 16 Reference Counting (cont'd) • What in case of indirect references? • We can still use reference counting, provided we consider all references to an object including references from other objects. • Object p = new Association(new Integer(57), new Integer(99)); August 7, 2008 James Atlas - CISC370 17 Reference Counting (cont'd) • When does reference counting fail? • When head is assigned to null, first object reference count becomes 1 and not zero • Reference counting will fail whenever the data structure contains a cycle of references head ListElements next next refCount = 1 August 7, 2008 ListElements refCount = 1 James Atlas - CISC370 ListElements next refCount = 1 18 Reference Counting (cont'd) • Advantages and Disadvantages + Garbage is easily identified. + Garbage can be collected incrementally. - Every object should have a reference count field. Overhead for updating reference count fields. It fails in the case of cyclic references. It does not de-fragment the heap August 7, 2008 James Atlas - CISC370 19 Mark-and-Sweep Garbage Collection • It is the first garbage collection algorithm that is able to reclaim garbage even for cyclic data structures. • Mark and sweep algorithm consists of two phases: mark phase sweep phase for each root variable r mark(r); sweep(); August 7, 2008 James Atlas - CISC370 20 Mark and Sweep (cont'd) void sweep(){ for each Object p in the heap { if (p.marked) p.marked=false; else heap.release(p); } } program August 7, 2008 James Atlas - CISC370 21 Mark and Sweep (cont'd) • Advantages It correctly identifies and collects garbage even in the presence of reference cycles. No overhead in manipulating references. • Disadvantages The program suspends while garbage collecting. It does not De-Fragment the heap. August 7, 2008 James Atlas - CISC370 22 Stop-and-Copy Garbage Collection • This algorithm collects garbage and defragments the heap. • The heap is divided into two regions: active and inactive. • When the memory in the active region is exhausted, the • • program is suspended and : Live objects are copied to the inactive region contiguously The active and in active regions reverse their roles • The Algorithm for each root variable r r=copy(r,inactiveHeap); swap (activeHeap,inactiveHeap); August 7, 2008 James Atlas - CISC370 23 Stop-and-Copy Garbage Collection (cont'd) head Object copy(Object p, Heap destination) { if (p==null) return null; if (p.forward==null) { q=destination.newInstance(p.class); p.forward= q; for each field f in p { if (f is primitive type) q.f=p.f; else q.f= copy(p.f, destination); } q.forward = null; } return p.forward; } inactive A’ active null B’ null C’ null August 7, 2008 James Atlas - CISC370 24 Stop-and-Copy Garbage Collection (cont'd) • Advantages It works for cyclic data structures It Defragments the heap. • Disadvantages All objects are copied when the garbage collector is invoked – it does not work incrementally. It requires twice as much memory as the program actually uses. August 7, 2008 James Atlas - CISC370 25 Java Garbage Collector History • 1.0-1.3 mark-and-sweep • 1.3 three memory spaces 1. Permanent space: used for JVM class and method objects Old object space: used for objects that have been around a while New (young) object space: used for newly created objects 2. 3. • Also broken into Eden, Survivor1 and Survivor2 allowed different techniques for each space August 7, 2008 James Atlas - CISC370 26 Java Garbage Collector History (cont’) • 1.3 techniques Copy-compaction: used for new object space. Mark-compact: used in old object space. • Similar to mark and sweep, mark-compact marks all unreachable objects; in the second phase, the unreachable objects compact. Incremental garbage collection (optional) • Incremental GC creates a new middle section in the heap, which divides into multiple trains. Garbage is reclaimed from each train one at a time. This provides fewer, more frequent pauses for garbage collection, but it can decrease overall application performance. • still all “stop-the-world” techniques August 7, 2008 James Atlas - CISC370 27 Java Garbage Collector History (cont’) • 1.4.1 introduced parallel GC algorithms Young Old Stop the world Multithreaded Concurrent Copying X X *Parallel copying X X X *Parallel scavenging X X X Incremental 1 (see note below) X Mark-compact X X *Concurrent X 2 (see note below) X Note 1: Subdivides the new generation to create an additional middle generation Note 2: Uses stop-the-world approach for two of its six phases August 7, 2008 James Atlas - CISC370 28 JVM Internals • The architecture JVM is an abstract concept Sun just specified the interface implementation details depend on specific product (SUN JDK, IBM JDK, Blackdown) • Java bytecode, the internal language independent from CPU-type (bytecode) Stackoriented, object-oriented, type-safe August 7, 2008 James Atlas - CISC370 29 Runtime view on a JVM Runtime Data storage Class loader Method Area (Classes) Heap (Objects) PC registers Stack Frames Native method stacks August 7, 2008 JVM runtime Native methods James Atlas - CISC370 30 Runtime data • Frame: • Saves runtime state of execution threads, therefore holds information for method execution (program counter) • All frames of a thread are managed in a stack frame August 7, 2008 James Atlas - CISC370 31 Runtime data • Method area Runtime information of the class file Type information Constant Pool Method information Field information Class static fields Reference to the classloader of the class Reference to reflection anchor (Class) August 7, 2008 James Atlas - CISC370 32 The Constant Pool • The "constant pool" is a heterogenous array of data. Each entry in the constant pool can be one of the following: string , class or interface name , reference to a field or method , numeric value , constant String value • No other part of the class file makes specific references to strings, classes, fields, or methods. All references for constants, names of methods, and fields are via lookup into the constant pool. August 7, 2008 James Atlas - CISC370 33 The Class File Structure H E A D E R CONSTANTPOOL FIELDS METHODS I ACCESS A N T FLAGS T T E R (Final, R I Native, F Private, B A Protected, U C T ...) E E S S • You can use a classdumper like javap -c or DumpClass to analyze these inner details August 7, 2008 James Atlas - CISC370 34 Javap -c -verbose example • Our Chicken.java class August 7, 2008 James Atlas - CISC370 35 The Class File Format • Java class files are brought into the JVM via the • • classloader The class file is basically just a plain byte array, following the rules of the byte code verifier. All 16-bit and 32-bit quantities are formed by reading in two or four 8-bit bytes, respectively, and joining them together in big-endian format. August 7, 2008 James Atlas - CISC370 36 Methods and Fields • The type of a field or method is indicated by a string • • called its signature. Fields may have an additional attribute giving the field's initial value. Methods have an additional CODE attribute giving the java bytecode for executing that method. August 7, 2008 James Atlas - CISC370 37 The CODE Attribute • maximum stack space • maximum number of local variables • The actual bytecode for executing the method. • A table of exception handlers, start and end offset into the bytecodes, an exception type, and the offset of a handler for the exception August 7, 2008 James Atlas - CISC370 38 Bytecode Basics August 7, 2008 James Atlas - CISC370 39 The JVM types • JVM-Types and their prefixes Byte Short Integer ints!) Long Character Single float double float References b s i (java booleans are mapped to jvm l c f d a to Classes, Interfaces, Arrays • These Prefixes used in opcodes (iadd, astore,...) August 7, 2008 James Atlas - CISC370 40 The JVM Instruction Mnemonics • • • • • • • • • Shuffling (pop, swap, dup, ...) Calculating (iadd, isub, imul, idiv, ineg,...) Conversion (d2i, i2b, d2f, i2z,...) Local storage operation (iload, istore,...) Array Operation (arraylength, newarray,...) Object management (get/putfield, invokevirtual, new) Push operation (aconst_null, iconst_m1,....) Control flow (nop, goto, jsr, ret, tableswitch,...) Threading (monitorenter, monitorexit,...) August 7, 2008 James Atlas - CISC370 41 Bytecode • Java Bytecode (JBC) are followed by zero or more • • • • bytes of additional operand information. Table lookup instructions (tableswitch, lookupswitch) have a flexible length The wide operation extension allows the base operations to use „large“ operands No self-modifying code No branching to arbitrary locations, only to beginning of instructions limited to scope of current method (enforced by verifier!) August 7, 2008 James Atlas - CISC370 42 Bytecode (Reverse) Engineering August 7, 2008 James Atlas - CISC370 43 Bytecode Engineering tools • Obfuscators Remove/Manipulate all information that can be used for reverse engineering • Native compilers „Real“ compile of java bytecodes to native instructions (x86/sparc) • Build your own bytecode Programmatic Generation Manipulate classfiles with an API August 7, 2008 James Atlas - CISC370 44 Obfuscators Techniques used • Identifier Name Mangling • • • The JVM does not need useful names for Methods and Fields They can be renamed to single letter identifiers Constant Pool Name Mangling • • Decrypts constant pool entries on runtime Control flow obfuscation • • Insertion of phantom variables, stack scrambling And by relying on their default values inserting ghost branch instructions, which never execute August 7, 2008 James Atlas - CISC370 45 Obfuscators Problems with Obfuscation • Constant value Mangling implies overhead processing in extra method call of an „deobfuscatename“ method in each retrieval from constant pool • Dynamic class loading may become broken as classes get new names and reflection calls like class.forName(„Account“) will fail because class „Account“ now known as by it‘s obfuscated name „b16“! • And: Obfuscation breaks patterns that can be recognized by JIT-engines for optimization August 7, 2008 James Atlas - CISC370 46 Protecting the Source Code: Native Compilers • Convert Java bytecode to C • Generate executable via normal c-build fast execution Additional decompilation effort needed Long turnaround times Even for small java programs you get monster size executable files (67mb source for Viva.java) from some commercial products Transformed program may than be vulnerable to buffer overflows and off-by-ones August 7, 2008 James Atlas - CISC370 48 Bytecode Reverse Engineering • Decompilation Get Source code from class files • Graphical Analysis Rebuild the logical control flow • Disassembly Get symbolic bytecode from class files August 7, 2008 James Atlas - CISC370 49 JAD • Java Decompiler Free for personal use JADClipse plugin for Eclipse - allows you to browse .class files August 7, 2008 James Atlas - CISC370 50