Download Formalising Java Safety – An overview

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Formalising Java Safety –
An overview
Pieter H. Hartel
[email protected]
2003.10.02.목
박숙영
HPCC Lab
1
Contents







Introduction
Methodology
Java Semantics
The compiler
Java extensions
Small footprint devices
Conclusions
HPCC Lab
2
Introduction 1/2
 Java is a safe programming language
 Type safe and memory safe
 The two main features
 Java does not offer pointer arithmetic
 Java offers references to objects
 Unused objects are automatically garbage
collected
 Java is a strongly typed language
 Java performs runtime checks to avoid array index
errors
HPCC Lab
3
Introduction 2/2
 Class loader
 Accepting and loading JVM programs into the Java
runtime environment
 Byte code verifier
 Another type checker operating on the JVM byte codes.
 Both do their work before execution ofo the code
from a newly loaded class starts.
HPCC Lab
4
Methodology 1/4
 Formal specifications




The semantics of Java
The semantics of the JVM language
The Java to JVM compiler
The runtime support, that is parts of the Java
API, including all java.* classes.
HPCC Lab
5
Methodology 2/4
 The methodology to build these
specifications
 Construct clear and concise formal
specifications of the relevant components
 Validate the specifications by animating them,
and by stating and proving relevant properties
of the components.
 Refine the specifications into implementations
 Create all specifications in machine-readable
form
HPCC Lab
6
Methodology 3/4
 Principal difficulties
 Multi-threading, exception handling, object
orientation and garbage collection
 Careful consideration
 Ambiguous, inconsistent, incomplete
 Reference implementation is complex
HPCC Lab
7
Methodology 4/4
 Popular assumptions
 Unlimited memory
 Individual storage locations can hold all
primitive data types
 Individual JVM program locations can hold all
byte code instructions
HPCC Lab
8
Java and JVM language features
 IM: Imperative core consisting of basic data,
expressions and statements
 OO: Object orientation, i.e. Objects, classes,
interfaces, and arrays
 TY: The Java type system, or byte code verification
in the JVM
 CL: Class loading
 EH: Exception handling
 MT: Multi-threading, monitors, synchronisation
 GC: Garbage collection
HPCC Lab
9
Java Semantics
 Table 1
HPCC Lab
10
Object Orientation
 Alves-Foss and LAM[1]
 denotational semantics of most of Java
 detail on the various basic data types in Java
 Better understanding
HPCC Lab
11
The type system 1/2


Based on simple sub typing
One novel feature

Java offers interfaces by way of creating multiple inheritance
 Drossopoulou and Eisenbach[24]
 Static semantics and dynamic semantics of a relatively small
subset of Java
 Drossopoulou et al[23]
 Extend their subset to include exception handling
 Syme[55]
 DECLARE system, gives proofs
 To uncover 40errors made during the translation
 Found two non-trivial errors in the hand written proofs of
Drossopoulou and Eisenbach
HPCC Lab
12
The type system 2/2
 Nipkow and von Oheimb[45]
 Prove type soundness of a similar subset to Drossopoulou et
al.
 Drossopoulou et al.
 Use Isabelle/HOL to machine-check the proofs from the
outset
 Higher degree of confidence in the correctness of the
specifications and the proofs
 Not able to validate the specifications
 Due to the lack of support for generating executable
semantics[58]
 Glesner and Zimmermann[26]
 Specify the type system for a small fragment of Java
HPCC Lab
13
Class Loader
 Wragg et al[62]
 Offer a model of class loading for a relatively small
subset of Java to study one of Java’s more
experimental features(binary compatibility)
Multi-threading
 Borger and Schulte[10] and Cenciarelli et al[13]
 Multi-threading at the Java level
 The study of the issues left open by the official SUN
documentation
HPCC Lab
14
The compiler
 Diehl[21]
 Compilation schemes for a subset of the Java that excludes
exceptioin handling, multi-threading and garbage collection to
the corresponding subset of the JVM
 Operational semantics of this JVM subset
 Rose[50]
 Natural semantics of a subset of Java
 Static type systems for both(Java, JVM)
 A specification of the compiler for the subsets
HPCC Lab
15
The Abstract State Machine
approach 1/3
 Borger and Shulte
 Working on formal specifications of Java, JVM,
Compiler
 Based on the Abstract State Machine formalism
 Full semantic account: in Gurevich[29]
 Specify a modular semantics of a subset of the JVM[11],
a subset of Java[10]
 Modular approach
 The two subsets do not entirely coincide
HPCC Lab
16
The Abstract State Machine
approach 2/3
 [7]
 Reducing the subsets of Java and the JVM to omit Multithreading, class loading and arrays.
 Main result
 Informal theorem stating the correctness of the compiler
 Two papers revisit exception handling and object
initialisation
 [8]: On problems with the initialisation of objects
 [9]: exception handling mechanism of java, the JVM, and the
Compiler
 Main result: Formulation of the correctness of compiling
exception handling, with a full proof
HPCC Lab
17
The Abstract State Machine
approach 3/3
 Stark[53]



The specification of Java and the JVM from Borger and
Schulte[11,10]
Presents a compiler from the imperative core of Java
Gives a correctness proof of the compiler
 A forthcoming book[6]

More complete specification of Java, the JVM, the compiler, the
byte code
 [9]

Mechanical checking of the specification
 Wallace[60]


Includes Multi-threading, exception handling
Excludes class loading and garbage collection
HPCC Lab
18
Java extensions
 The safety of Java programs
 By using program verification techniques
 Fewer design and implementation problems
 Smart cards
HPCC Lab
19
Model checking 1/3
 Demartini et al[18], Havelund et al[31]
 How core features of Java can be mapped onto the Promela
language of the SPIN model checker.
 multi-threading and objects.(Havelund et al model
exceptions.)
 the objects using Promela’s arrays(one array element per
instance of the class)
 The resulting models quickly grow too large to model check
effectively
 only check for safety properties(assertions, deadlock)
 do not provide support for the checking of liveness
properties
HPCC Lab
20
Model checking 2/3
 One of the most useful features of the SPIN model checker
 Its ability to display scenarios leading to problems(deadlock)
 Demartini et al
 To relate these scenarios back to the original Java sources
 More user friendly than that of Havelund et al.
HPCC Lab
21
Model checking 3/3
 Jensen et al[33]
 Use model checking to verify properties of Java
programs, more abstract approach
 Static analysis techniques
 To reduce a Java program to a control flow graph
 Method calls, method returns, assertions
 Defines the state transitions of the abstract Java
program
 Example[38]
 How the system can be used to model Java’s sandbox
 The stack inspection introduced by Java 2
HPCC Lab
22
Theorem proving 1/3
 Detlefs et al, Modula 3[20], Java[52]
 Offers by requiring the programmer to annotate programs
with pre- and post-conditions.
 The compiler is able to generate and prove the verification
conditions.
 The system of Detlefs et al
 does not require the programmer to annotate programs with
loop invariants and variants
 derives loop invariants automatically
 Assume that loops are executed at most once
 Powerful

The type checker < the system < full verification
HPCC Lab
23
Theorem proving 2/3
 The LOOP project of Jacobs et al
 Full verification of Java programs
 Use a denotational semantics based tool to translate Java
into the higher order logic of widely used theorem
provers(PVS[32], Isabelle/HOL[57])
 ..
 Properties
 Termination of a method
 In-variants on the fields of a class
HPCC Lab
24
Theorem proving 3/3
 Poetzsch-Heffter and Muller[47]
 An operational/axiomatic semantics of a subset set of
Java
 prove the soundness of the axiomatic semantics with
respect to the operational semantics.
 embedded in HOL
 Mechanical checking of the soundness proof would be
feasible.
 Moore[39]
 A new version of a small subset of Cohen’s
specification[15] of the JVM
 How the ACL2 theorem prover is capable
HPCC Lab
25
Controlling type casts
 Java’s lack of polymorphism
 Requires programmers to insert type casts in their programs
 Example
 When storing an object, MyObject

One must remember to cast the raw object back into the user
class MyObject when retrieving the information
 Erroneous type casts: cause unexpected runtime exceptions
 Pizza[46] and Generic Java[12]
 Automatically inserting the required type casts.
 Generic Java
 No cast inserted by the compiler will fail
HPCC Lab
26
Controlling execution time
 Java safety would be able to guarantee that
computations terminate(within certain bounds)
 The denial of service attack would be prevented
 Execution time is one of the most difficult to
control resources.
HPCC Lab
27
Code certification 1/2
 Necula and Lee[40]: proof carrying code(PCC)


Automatic verification technique(assembly level programs)
The producer
 expresses a safety property in terms of pre and post conditions on
the program
 annotates the program, with loop invariants etc
 generates a proof of the safety property(by hand/using a
mechanical proof assistant)

The consumer
 receives the code and the proof
 mechanically checks that the proof is consistent with the program

The program satisfies the safety property
 Does not need to trust the producer
 relies only on a small trusted infrastructure(type checker)
HPCC Lab
28
Code certification 2/2
 The problems of the PCC approaches
 The size of a proofs: exponential in the size of the
program[42]
 The amount of redundancy
 Necula and Lee[41]
 Reduce a proof of size n to a proof of size √n by avoiding
some redundancy
 Program verification requires special skills
 To formulate properties
 To discover appropriate loop invariants
 To drive mechanical theorem provers etc.
 It is essential that tools are automatic, or at least require as
little programmer intervention as possible
HPCC Lab
29
Small footprint devices
 Small footprint devices


Mobile phones, PDAs, K Virtual Machine: 128KB of RAM
Smart card
 A few hundred bytes of RAM & a dozen or so KB of EEPROM
 Java-Card VM(JCVM)
 3 disadvantages



The full potential and flexibility of client server software development
cannot be realised
Java applets running on the smallest embedded controllers cannot be
verified appropriately before they are run
The freedom of code migration is restricted
 Based on the Split VM concept


Pushes part of the byte code verification from the loading to the
compilation/linking phase.
JVM byte code ☞ JCVM format

Byte code verification, optimises, prepares the code for loading into the device.
HPCC Lab
30
Byte code compression
 Clausen et al[14]
 Retain JVM byte codes
 Propose to compress them for the benefit of embedded
systems
 The compression technique
 Commonly occuring sequences of instructions
 A new ‘macro’ instruction
 30% loading time increase ☞ 30% space save up
HPCC Lab
31
Class file conversion 1/3
 Hartel et al[30]: the Java Secure Processor(JSP)



Provide a complete specification of an early version of the JCMV
Excludes multi-threading, garbage collection and exception handling
Validated using the letos tool
 Methodological point[56]


Earlier JSP
 the full JVM ☞ cutting back unwanted features.
Newer KVM
 Scratch ☞ adding features as required.
 The developers of the picoPERC version of the JVM [44]


offer a core VM(64KB)
provide tools to add further functionality to the core VM
HPCC Lab
32
Class file conversion 2/3
 Lanet and Requet[35]
 B-method
 To study one particular aspect of the conversion from JVM to
JCVM code
 Their results include
1. A specification of the constraints imposed by the byte code
verifier for a small subset of the JVM
2. A specification of the semantics of this subset of the JVM byte
codes
3. A specification of the semantics of the corresponding subset of
the JCVM byte codes
4. A proof that the specification of the JCVM subset is a data
refinement of the JVM subset
HPCC Lab
33
Class file conversion 3/3
 Denney and Jensen[19]
 Complementary to that studied by Lanet and Requet.
 Lanet and Requet
o
The conversion of JVM class files to JCVM class files by a
‘tokenisation’
 Replaces names in the class files
o
o


Reducing the size of the class files
Speeding up the loading process
Use the Coq theorem prover to mechanically check their proofs.
Use an elegant method to parameterise their operational semantics
over name resolution
HPCC Lab
34
Byte code verification revisited 1/2
 Split VM concept

Off-line verification: Signing the results digitally(signature)
 Posegga and Vogt[49,48]

To use a model checker(SMV) to perform off-line byte code
verification for smart cards.
 Posegga et al[27]

Propose to implement a tiny proof checker on a smart card.
HPCC Lab
35
Byte code verification revisited 2/2
 Rose and Rose[51]


Use Necula and Lee’s proof carrying code(PCC) method to ‘split’
the byte code verifier.
1. The verification
 To reconstruct the types associated with all local variables
and stack locations of JVM code
2. The certification
 To check based on the reconstructed types, that each
instruction is correctly typed.
Advantage
1. The certification process is simple
2. Only the certification needs to be trusted, not the verification
HPCC Lab
36
Conclusions







On modelling garbage collection, and the Java API.
On building more appropriate theories for programming language
semantics modelling.
On simplifying and modularising the individual components of Java
implementations.
On reducing the size of the trusted computing base, so that flaws
are less likely to compromise the security of the system as a whole.
On considering formal specification, validation and provably correct
implementation as a whole, rather than in separation.
On presenting clear an concise formalisations of systems, which are
accessible to the designers and implementors of these systems.
On using machine machine-readable specifications.
HPCC Lab
37