Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Java Virtual Machine, JVM a Teodor Rus [email protected] The University of Iowa, Department of Computer Science a These slides have been developed by Teodor Rus. They are copyrighted materials and may not be used in other course settings outside of the University of Iowa in their current form or modified form without the express written permission of the copyright holder. During this course, students are prohibited from selling notes to or being paid for taking notes by any person or commercial firm without the express written permission of the copyright holder. Introduction to System Software – p.1/64 Target of the assembler The target of the assembler in this class is the language of a virtual machines (VM): V M = hP rocessor, P rogram, ExecutionM odeli Note: 1. The VM should be such that it can be used to simulate the computation of real machines; 2. Java Virtual Machine is such a machines. Introduction to System Software – p.2/64 Rationale 1. JVM (under the name of P-machine) was successfully used as target in many projects on compiler design and implementation; 2. JVM is successfully used as an abstract machine simulating the computation performed by current real machines in Java language environments; 3. Interpreters simulating the execution of JVM programs on real hardware are extensively implemented and accepted; 4. Oolong, the assembly language of the JVM is available; 5. Finally, this provides a good educational experience. Introduction to System Software – p.3/64 Processor abstraction • A processor abstraction needs to represent any concrete hardware; hence it should be a virtual computer and an implementation. • Once the virtual computer is implemented on a particular system all programs written for the virtual computer will run on that system. • This allows programmers to write programs once (for the virtual computer) and run them anywhere. (Java slogan) Introduction to System Software – p.4/64 Fact • • The virtual computer operates on an abstract memory handling objects rather than bits and bytes. That is, the virtual computer hides the complexities of a real hardware such as: 1. Memory structure and addressing; 2. Intricacies of instruction patterns; 3. Program control and data flows. Introduction to System Software – p.5/64 JVM specification JVM is a computer abstraction defined by: • The set of operations that it performs, called bytecodes; • The structure of the program JVM can execute called the class file format, (CFF); • The verification algorithm that ensures the integrity of JVM program. Introduction to System Software – p.6/64 Program execution 1. JVM takes its instructions from the CFF; 2. Operations performed by JVM take their operands from a stack and generate their results on the stack. Hence address computation is not a problem; 3. JVM operates on objects rather than operating on bits and bytes. Hence, interpretation is the same for all instructions Introduction to System Software – p.7/64 JVM instructions JVM instructions are classified in 6 groups: 1. instructions whose operands are in top of the stack Examples: add, mul, div, etc.; 2. instructions for object allocation; 3. instructions for method invocation; 4. instruction for retrieving and modifying fields in the objects; 5. instructions for moving information between stack and objects. Examples: load n (moves the value of local variable n onto stack); store n (store the value on top of the stack into variable n) Introduction to System Software – p.8/64 Example Consider the following JVM code: getstatic java/lang/System/out Ljava/io/PrintStream; ldc "Hello, world" invokevirtual java/io/PrintStream/println (Ljava/lang/String;)V Introduction to System Software – p.9/64 Example, continuation The meaning of this code is: 1. Retrieve the value of out field in the class java/lang/System and push it on the stack; this is an object of the class java/io/PrintStream 2. Push the constant "Hello, world" on the stack 3. Invoke the method println, which is defined in the class java/lang/PrintStream and expects stack to contain an object of java/lang/String and a reference to out, an object of the class java/io/PrintStream Introduction to System Software – p.10/64 Class File Format, CFF • Represents a Java class as a stream of bytes; Java platform has methods for converting Java class files into classes in JVM • CFF is not necessarily a file, it can be stored in a database, across the network, as part of Java archive file, JAR, etc. • CFF is standardized and is manipulated by the ClassLoader, part of Java platform Introduction to System Software – p.11/64 Note • If one stores CFF in a nonstandard form then one needs to construct an appropriate ClassLoader to handle it. Introduction to System Software – p.12/64 Verification algorithm • Purpose: ensures that programs follow a set of rules that are designed to protect the integrity of JVM programs; • The verification algorithm perform an abstract interpretation of CFF. If this fails the JVM program in the CFF is aborted. Note: this doesn’t mean that one cannot write a JVM program that while conforming to the rules implemented by the verification algorithm violates the integrity of the JVM. Introduction to System Software – p.13/64 Java Platform • JVM perform fundamental computational tasks but it lacks features for doing computer-oriented things like graphics, Internet communications, etc • Java platform includes JVM and a collection of classes that are collected into the package java. • Examples of such classes:java.applet, java.io, java.awt (abstract window toolkit), java.security, etc. Introduction to System Software – p.14/64 Assumptions • JVM cannot function independent of Java platform. • We assume further that Java platform contain java.lang.Object, java.lang.ClassLoader, java.lang.String, java.lang.Class • Note the dot-notation for Java and slash notation for JVM Introduction to System Software – p.15/64 JVM architecture JVM is divided into four conceptual data spaces: • Class area, where the JVM program (consisting of byte codes and constants) is kept; • Java stack, which keeps track of which methods have been called and the data associated with each method invocation; • Heap, where objects are kept; • Native method stacks, for supporting native methods. Introduction to System Software – p.16/64 Class area Stores the classes loaded into the system; each class is defined in terms of the properties: • Its superclasses; • List of interfaces (possibly empty); • List of fields; • List of methods and their implementations stored in the method area; • List of constants, stored in the constant pool. All properties of a class are immutable (i.e., are unchangeable) Introduction to System Software – p.17/64 Class descriptors • Each field is defined by a descriptor that shows the properties of the object occupying that field such as static or not; • For nonstatic fields there is a copy in each object of the class; for static fields there is a single copy for the entire class of objects; • Each method is defined by a descriptor that shows method type and method modifiers which are abstract, static, etc.; • An abstract method has no implementation; a non-abstract method has an implementation defined in terms of JVM instructions. Introduction to System Software – p.18/64 Example of class representation Figures 1 and 2 depict two class areas ClassName: GamePlayer Superclass: java/lang/Object Fields: name dscrptr: Ljava/lang/String; modifs: none Methods: main dscrptr:Ljava/lang/String; modifs:public, static - Figure 1: main method implementation The GamePlayer class representation Introduction to System Software – p.19/64 Example, continuation ClassName: ChessPlayer GamePlayer Superclass: Fields: color dscrptr: I modifs: private piece dscrptr: I modifs: static Methods: getMove dscrptr:()LMove; modifs:public - getMove method implementation Figure 2: The ChessPlayer class representation Introduction to System Software – p.20/64 JVM stack JVM operates on a stack of stack frames. • A stack frame consists of three elements: 1. The operand stack, which contains the operands of the operations performed by JVM; 2. The array of local variables of the method; 3. Program counter PC, shows first instruction of the method. Introduction to System Software – p.21/64 Execution model • Each time a method is invoked a new stack frame is created and is pushed on the JVM stack; • When a method terminates its stack frame is popped out. The JVM performs the loop: while (PC.opcode != Halt) { Execute (PC); PC := Next(PC); } Introduction to System Software – p.22/64 More on execution model • The top frame of JVM stack shows the currently executing method and is called active frame (AF); • Only the operand stack and the local variable array in the active frame can be used during JVM program execution; • Each operation performed by JVM evaluates an expression whose operands are on the operand stacks and leave the result on the operand stack; • When a method calls another method the PC of the caller is saved in the active frame; when callee completes the result is in top of the operand stack and the caller is resumed using the PC from callee stack frame and caller array of local variables. Introduction to System Software – p.23/64 The Heap • Each object is associated with a class (its type) in the class area and is stored in the heap. • Each object has a number of slots for storing fields; there is one slot for each nonstatic field in the class associated with the object. • Each object has a number of slots storing methods that operate on that object; there is one method for each abstract method of the class associated with the object. Introduction to System Software – p.24/64 Example object Figure 3 shows the heap representation of an object of the class ChessPlayer. Player’s name ChessPlayer GamePlayer - C ToTheName ToTheClass - Superclass Pooky pieces:16 java/lang/String color:1 Figure 3: An object of the class ChessPlayer Introduction to System Software – p.25/64 Native method stacks • Native methods are methods implemented using other languages than JVM; • Native methods allow programmer to handle situations that cannot be handled completely by Java, such as interfacing with platform dependent features or legacy code; • Native methods are executed using C-like stacks; • Native methods do not exist on all JVM implementations; moreover, different JVM implementations may have different standards for native methods; • The standard Java Native Interface, JNI, should be available for native method documentation. Introduction to System Software – p.26/64 Garbage collection • Each object consumes some memory from the heap; • Eventually the memory allocated to JVM object is reclaimed; • JVM reclaims object’s memory automatically through a process called garbage collection; • An object is ready to be garbage collected when it is no longer “alive". Introduction to System Software – p.27/64 Object liveness Rules that determining if an object is alive are: 1. If there is a reference to the object on the stack then the object is alive; 2. If there is a reference to the object in a local variable on the stack or in a static field, then the object is alive; 3. If a field of an alive object contains a reference to the object then the object is alive; 4. JVM may internally keep references to certain objects, for example to support native methods. These objects are alive. Introduction to System Software – p.28/64 Verification process • Ensures that class files follow certain rules; • Allows JVM to assume that a class has certain safety properties and to make optimizations based on this; • Makes it possible to safely download Java applets from Internet; • Java compiler generates correct code. However JVM programmer can bypass the restrictions. Verification algorithm checks this. Introduction to System Software – p.29/64 How does it work? It asks questions about CFF, such as: • Is it a structurally valid class? • Are all constant references correct? • Are all instructions valid? • Will stack and locals contain values of appropriate type? • Do classes used really exist and are correct? Introduction to System Software – p.30/64 JVM machine language syntax • Level 0: byte codes, indices in CFF (integers), indices in the array of local variable, constant tags. • Level 1: constants and instructions; • Level 2: Class File Format, CFF. Introduction to System Software – p.31/64 JVM codes 1. JVM uses Unicode character codes (rather than ASCCI or EBCDIC). The Unicode Consortium manages this codes; 2. The Unicode was designed such that it can accommodate any known character set used by people’s alphabets; 3. Unicode Transformation Format, UTF-8, UTF-16, UTF-32 are Unicode character representations on byte, 2-bytes (half-word), 4-bytes (word). Introduction to System Software – p.32/64 Constant tags Table 1: Constant tags Tag Type Format Interpretation 1 UTF8 2+n First 2 bytes encode length n followed by n bytes of the text of the constant 2 undefined 3 Integer 4 bytes Text of a signed integer 4 Float 4 bytes Text of IEEE 754 floating-point number 5 Long 8 bytes Text of long signed integer 6 Double 8 bytes Text of IEEE 754 double-precision number 7 Class 2 bytes Reference to class name, a UTF8 constant 8 String 2 bytes Reference to string name, a UTF8 constant 9 FieldRef 4 bytes First 2 show a Class constant, second 2 a NameAndType constant (tag 12 below) Introduction to System Software – p.33/64 Constant tags, continuation Table 2: Constant tags Tag Type Format Interpretation 10 MethodRef 4 bytes Same as FieldRef 11 IntMetRef 4 bytes Same as FieldRef 12 NameAndType 4 bytes First 2 point to name, second 2 point to descriptor. Both are UTF8 constants Introduction to System Software – p.34/64 Is CFF structurally valid? • The first 4 bytes of CFF must contain the hex values: CA FE BA BE which is the magic number; • Following the magic number are minor and major version; each take two bytes interpreted as a 16-bit unsigned: Example: JDK 1.0, 1.1: Major = 0X2D (45), Minor = 0X3(3); Java 2: Major: 0X2E(46); Minor: 0, if Major = 45 then Minor > 3 • Figure 4 shows the structure of a CFF Introduction to System Software – p.35/64 Structure of the CFF Magic# Minor Major CnstPool Class Super Interface Figure 4: Fields Methods Structure of a properly formatted CFF Introduction to System Software – p.36/64 More on CFF structure • Most sections begin with a count, which is a two-byte unsigned, followed by count instances of some pattern of bytes; • Example: (see tags in Tables 1,2) 1. Constant pool start with a count followed by as many constant patterns as it specifies; 2. Each constant pattern consists of a one byte tag and a number of bytes on which constant is written; 3. The tag describes the kind of constant that follows and how many bytes does it take; 4. If any tag is invalid or file ends before correct number of constants is found then CFF is rejected. Introduction to System Software – p.37/64 Check constant references • Class and String constants must have references to UTF8; • FieldRef,MethodRef, InterfaceMethodref must have a class index that is a class constant and a name-and-type index; • NameAndType constants must have two indices pointing to UTF8. Introduction to System Software – p.38/64 Example JVM code Figure 5 shows a portion of the code: .class Foo .super Bar .implements Baz .field field1 LFoo; .method isEven (I)Z ; ; ... .end method Introduction to System Software – p.39/64 1 2 3 4 5 6 7 8 9 10 1 0 1 0 6 1 0 4 1 0 6 1 0 5 1 0 3 7 0 5 1 0 3 7 0 7 1 0 3 . 7. . 0 9 0 A 0 8 0 1 0 6 0 1 0 0 0 3 0 4 0 0 0 1 0 0 0 1 0 2 0 . 1. . i ( f L B B F Figure 5: s I i F a a o E ) e o z r o v e n B l d 1 o ; Constant pool count 162 = 256 UTF8 isEven UTF8 (I)B UTF8 field1 UTF8 LFoo UTF8 Baz Class name index = 5 UTF8 Bar Class name index = 7 UTF8 Foo Clas: name index = 9 This class index (10 = Foo Superclas index (8 = Bar) Interface count = 1 Interface index (6 = Baz) Fields count = 1 There are no field flags Field name index (3 = field1) Field descriptor index (4 = LFoo) Field attributes count = 0 Method count = 1 There are no method flags Method name index (1 = isEven) Method descriptor index (2=(I)B) Method attributes count Method attributes Are all constants references correct? Introduction to System Software – p.40/64 Are all instructions valid? Once we know that overall class structure is valid we can look at method bodies to check if the instructions are correctly formatted. Introduction to System Software – p.41/64 Problem to be solved • Does each instruction begin with a recognized opcode? • If instruction takes a constant pool reference as argument, does it point to an actual constant pool entry with the correct type? • If the instruction uses a local variable, is the local variable range within the correct range? • If the instruction is a branch, does it point to the beginning of an instruction? Introduction to System Software – p.42/64 A closer look at CFF Consider the Java "hello world" program: public class hello { public static void main(String argv[]) { System.out.println("Hello, world"); } } Note: the file hello.java, containing this program, is mapped by the java compiler (javac hello.java) into the CFF file hello.class that is interpreted by JVM. To understand CFF we look at the file hello.class Introduction to System Software – p.43/64 Notation Represent CFF on three columns: 1. Left column: offset, in hex, into CFF 2. Middle column: bytes at the offset location in hex 3. Right column: interpretation of the middle column by JVM Introduction to System Software – p.44/64 Example File header 000000 cafebabe Magic = ca fe ba be 000004 0003 Minor version = 3 000006 002d Major version = 2*16 + 13 = 45 Introduction to System Software – p.45/64 Constant pool 000008 0020 There are 2 * 16 = 32 constants in the pool 00000a 08001f 1:a string at index 16 + 15 = 31 in CFF 00000d 07001d 2:a class name at index 16 + 13 = 29 in CFF 000010 070018 3:a class name at index 16 + 8 = 24 in CFF 000013 07000e 4:a class name at index 14 in CFF 000016 070013 5:a class name at index 19 in CFF 000019 090002000a 6:FieldRef:class index 2,name-and-type index 10 00001e 0a00040009 7:MethodRef:class index 4,name-and-type index 9 000023 0a0003000b 8:MethodRef:class index 3,name-and-type index 11 000028 0c000c0017 9:NameAndType:name index 12,descriptor index 23 00002d 0c0016001c 10:NameAndType:name index 22,descriptor index 28 000032 0c001b001e 11:NameAndType:name index 27,descriptor index 30 000037 010007 12: UTF8, length 7 00003a 7072696e746c6e println Introduction to System Software – p.46/64 Constant pool, continuation 000041 01000d 13: UTF8, length 13 000044 436f6e7374616e7456616c7565 ConstantValue 000051 010013 14: UTF8, length 19 000054 6a6176612f696f2f5072696e74537472 java/io/PrintStream 000067 01000a 15: UTF8, length 10 00006a 457863657074696f6e73 Exceptions 000074 01000a 16: UTF8, length 10 000077 68656c6c6f2e6a617661 hello.java 000081 01000f 17: UTF8, length 15 000084 4c696e654e756d6265725461626c65 LineNumberTable 000093 01000a 18: UTF8, length 10 000096 536f7572636546696c65 SourceFile 0000a0 010005 19: UTF8, length 5 0000a3 68656c6c6f hello Introduction to System Software – p.47/64 Constant pool, continuation 0000a8 01000e 20: UTF8, length 14 0000ab 4c6f63616c5661726961626c6573 LocalVariables 0000b9 010004 21: UTF8, length 4 0000bc 436f6465 Code 0000c0 010003 22: UTF8, length 3 0000c3 6f7574 out 0000c6 010015 23: UTF8, length 21 000069 284c6a6176612f6c616e672f53747269 (Ljava/lang/String;)V 0000de 010010 24: UTF8, length 16 0000e1 6a6176612f6c616e672f4f626a656374 java/lang/Object 0000f1 010004 25: UTF8, length 4 0000f4 6d61696e main 0000f8 010016 26: UTF8, length 22 0000fb 285b4c6a6176612f6c616e672f537472 ([Ljava/lang/String;)V Introduction to System Software – p.48/64 Constant pool, continuation 000111 010006 27: UTF8, length 6 000114 3c696e69743e <init> 00011a 010015 28: UTF8, length 21 00011d 4c6a6176612f696f2f5072696e745374 Ljava/io/PrintStream; 000132 010010 29: UTF8, length 16 000135 6a6176612f6c616e672f53797374656d java/lang/System 000145 010003 30: UTF8, length 3 000148 282956 ()V 00014b 01000c UTF8, length 12 00014e 48656c6c6f2c20776f726c64 Hello, world Introduction to System Software – p.49/64 Constant entries • The first constants are strings codified as UTF8 entries • Strings are followed by small constants, 3,4,5, etc (of which there is none in the example) codified on a byte • These are followed by integer and long constants codified as two’s complement signed integers on 32 and 64 bits respectively. Floating and double constants codified as shown in Table 3 • Introduction to System Software – p.50/64 Other fields Fields, Methods, and Class entries: • Constants with tags 9, 10, 11 are identical. They are used to refer to fields and methods in field and method instructions such as getfield, putstatic, invokevirtual • Example: constant 7 in constant pool is 0a 0004 0009 i.e: 1. 0a = 10, it is a MethodRef 2. Class containing the method is at index 4 whose name is at index 14, i.e., java/io/PrintStream 3. Name and descriptor is at index 9: name index 12 (println), descriptor index 23 [(Ljava/lang/String;)V] This is enough info to call the method; Constant 7 is used to code the arguments of Oolong instructions Introduction to System Software – p.51/64 Class information • Following the constant pool is the information about the class itself which consists of: name, type, and access flags as seen below • Example hello,java 00015b 0021 two bytes, access flags = 33 00015d 0005 two bytes, index of this in constant pool, 5 00015f 0003 two bytes, index of super in constant pool, 3 000161 0000 two bytes, number of interfaces, 0 Introduction to System Software – p.52/64 Access flags are interpreted as a bit-vector as seen below: Bit Name Meaning 1 ACC_PUBLIC The class is public 2-4 Not used 5 ACC_FINAL The class is final 6 ACC_SUPER The class is supper 7-9 10 Not used ACC_INTERFACE 11 12 The class is an interface Not used ACC_ABSTRACT The class is abstract Introduction to System Software – p.53/64 Fields and Methods After class information comes four bytes that describe the number of fields and methods. In our example they are: 000163 0000 Number of fields is zero 000165 0002 There are two methods in this class Fields and methods have identical formats. 000167 0009 access flags of the method = 9 000169 0018 name of the method is index 24 in constant pool (main) 00016b 001a descriptor of the method has index 26 in constant pool Introduction to System Software – p.54/64 Method access flags Are specified in the table: Bit Name Meaning 1 ACC_PUBLIC The field/method is public 2 ACC_PRIVATE The field/method is private 3 ACC_PROTECTED The field/method is protected 4 ACC_STATIC The field/method is static 5 ACC_FINAL The field/method is final 6 ACC_SYNCHRONIZED The method id synchronized 7 ACC_VOLATILE The field is volatile 8 ACC_TRANSIENT The field is transient 9 ACC_NATIVE The method is native 10,11 12 Unused ACC_ABSTRACT The method is abstract Introduction to System Software – p.55/64 Attributes • After the general method or field information the CFF contains a list of attributes • Fields and methods have different kind of attributes. Methods have a single attribute giving the implementation of method; most fields have no attributes at all • Only the ConstantValue attribute is defined for fields • Attributes for the methods are represented as shown bellow Introduction to System Software – p.56/64 Attributes for methods 00016d 0001 1 method attributes: method attribute 0 follows 00016f 0015 name: at index 21 in constant pool, Code 000171 00000025 Length of the code is 37 000175 0002 Maximum stack is 2 slots 000177 0001 Maximum space for locals is 1 Introduction to System Software – p.57/64 The actual byte code Disp. Bytecode Addr Interpretation 000179 00000009 00017d b20006 0000 getstatic #6, index in constant pool 6 000180 1201 0003 ldc #1, index 1 in constant pool 000182 b60007 0005 invokevirtual #7, index 7 in constant pool 000185 b1 0008 return Code length: 9 bytes code of length up to 4G bytes (232 ) is allowed; however, other constraints limit code size to 64K. Note: Introduction to System Software – p.58/64 Observations 1. There are two forms of ldc instruction, ldc and ldc w: ldc requires one byte argument interpreted as index 0..255 in constant pool, ldc_w requires two bytes argument that may refer to any constant 2. In either case constant pool entry must be Integer, Float, Double, Long, or String Introduction to System Software – p.59/64 Exception table Following byte code is an exception table entry which begins with two-byte count, the number of entries: 000186 0000 there are no exceptions in this method following the exception handler table, the code attribute may have attributes of its own, such as debugging info. Note: Introduction to System Software – p.60/64 Main method The main method has one attribute, LineNumberTable: 000188 0001 1 code attributes: code attribute 0 follows 00018a 0011 Name: index 17 in CFF: LineNumberTable 00018c 0000000a Length of attribute 10 000190 0002 Number of entries: 2 000192 0000 Start PC: 0 000194 0005 Line number: 5 000196 0008 Start PC: 8 000198 0003 Line number 3 Introduction to System Software – p.61/64 Method 1 Starts after the code attribute of method 0 00019a 0000 Access flags = 0 00019c 001c Name: index 28 in constant pool (<init>) 00019e 001e Descriptor: index 30 in constant pool, ()V 0001a0 0001 1 method attributes: method attribute 0 0001a2 0015 Name: index 21 in constant pool (Code) 0001a4 0000001d length of the attribute 29 0001a8 0001 Maximum stack: 1 0001aa 0001 Maximum locals: 1 0001ac 00000005 Code length: 5 0001b0 2a 00000000 aload_0 0001b1 b70008 00000001 invokespecial #18 0001b4 b1 00000004 return Introduction to System Software – p.62/64 Method 1, continuation 0001b5 0000 0 exception table entries 0001b7 0001 1 code attributes: code attribute 0: 0001b9 0011 Name: index 17 in constant pool (LineNumberTable) 0001bb 00000006 Length of attribute : 6 0001bf 0001 Length of table 1 0001c1 0000 0001 Start PC: 0, Line number: 1 Introduction to System Software – p.63/64 Class attributes • CFF ends with a list of class attributes • A class can have any attributes it wants but only SourceFile attribute is defined in Java specification 0001c5 0001 1 class file attributes Attribute 0: 0001c7 0012 Name: index 18 in constant pool (SourceFile) 0001c9 00000002 Length: 2 bytes 0001cd 0010 Name: index 16 in constant pool (hello.java) Introduction to System Software – p.64/64