Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
No More Pain for XML’s Gain XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Itay Maman 236826 Seminar lecture, 15 June 2005 The basic premise • XML is getting increasingly popular • XML manipulation is now a common programming task • The lead question: – Do modern OO languages sufficiently support XML ? 2 Introduction: Schema file (file: technioncatalog.xsd) <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> 3 Introduction: XML document (file: short.xml) <?xml version="1.0" encoding="UTF-8"?> <catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course> </catalog> “Combinatorics for CS (234141) by Ran El-Yaniv, Desired Output... 3 credit points” 4 Introduction: The XJ program import java.io.*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } } 5 Traditional XML processing: (DOM, XPath apis) The types of the XML objects (Node, Document) do not reflect the schema public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); } XPath is a plain string. It may be: •Syntactically incorrect •Incompatible with the document 6 Traditional XML processing (DOM apis) Assumption: 3rd child is the course number Assumption: 2nd child has no child elements private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points"); } What about reading the numeric value of an element? Assumption: Four child nodes must exist • These assumptions will not hold if the schema is changed – => run-time errors – problems remain, even if we identify nodes by name • Possible Schema changes: – Allowing a new optional <students> sub-element – Changing the order of the sub-elements 7 No easy solution • Similar problems occur when: 1. XML elements are created by the program 2. Other libraries are used for reading/writing XML documents – Such as: Xalan, SAX 3. The developer wraps several complex operations within a single function/method/class • These are inherent problems of the language 8 Shaping the future • What XML-related facilities do we want? – – – Typed XML objects Seamless translation of a Schema/DTD into a Java type Two composition techniques • XML notation • Java’s object creation syntax – Two decomposition techniques • Typed XPath • Typed, named methods/fields – XPath expressions as first-class-values 9 Has the future arrived yet? • Significant effort in integration of XML into modern programming language – – – – – • XJ Scala Cω XTatic … We will overview the constructs offered by XJ – A super-set of Java – Available at: http://www.research.ibm.com/xj 10 XJ’s Type system 11 XJ’s Type system • Hierarchy of classes – – • A common root class: XMLObject Automatic import: package com.ibm.xj.* Genericity: Sequence<T>, XMLCursor<T> – XMLCursor<T> is a Sequence<T> iterator 12 Integration with Schema • The rationale: 1. An OO program is a collection of class definitions 2. A Schema file is a collection of type definitions • => let’s integrate these definitions • Any Schema is also an XJ types – The XJ compiler generates a “logical class” for each such type – Schema file == package name – Using a schema == import schema_file_name; 13 XML literal in XJ code • • • Invalid XML content triggers a compile-time error Resulting elements are typed! Curly braces allow “escaping” back into XJ import technioncatalog.*; public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points> <number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); } private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } } 14 An ill-typed program ... Wrong <course> element course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c); XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x); An XMLObject cannot be passed as ... a course element private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } 15 Embedding XPath Queries in XJ • Syntax: XmlValue [| XPathQuery |] course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |]; } • Requires: a context-provider: – – • An XML element over which the XPath query is invoked (see the cat variable in the sample) Escaping: use a ‘$’ prefix 16 XPath Semantics • • Problem: resulting type is sometimes not so clear Two options – Sequence<T> • If the compiler determines that all result elements are of type T – Sequence<XMLObject> • (Otherwise) • Automatic conversion from a singleton sequence • Static check of XPath queries – – If result is always empty => compile-time error (The compiler cannot catch all cases) 17 Implicit coercions • An atomic XML value can be seamlessly converted into a corresponding Java value – – – – • xsd:double => double xsd:boolean => boolean xsd:string => java.lang.String … This reduces the verbosity of XML-related code: import technioncatalog.*; import technioncatalog.catalog.*; public static String getTeacher(course c) { return c [| /teacher |]; } Sequence<teacher> ► teacher ► String 18 Updates: Assignment to Query Result public static void changePoint(catalog.course c, int p) { c [| /points |] = p; } • • • An XPath expression returns a reference to an existing element – (No copying is involved) – Consistent with Java’s semantics for objects Thus, it can be assigned to – An XPath expression is a legal lvalue Bulk assignment – Occurs when the XPath expression denotes a sequence – Bulk assignment operator := allows multiple assignments – Double the credit points of each course: cat [| //points |] *:= 2; 19 Tree structure update • Class XMLObject also defines methods, such as: – – – – insertAfter() insertBefore() insertAsFirst() detach() public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c); } Which object is being modified? 20 Problems: Type Consistency • Definitions 1. An XML update operation, u, is a mapping over XML values • u: T1 -> T2 2. An update is consistent if T1 = T2 • Ideally, a compile-time error should be triggered for each inconsistent update in the program Why do we want the two types to be equal? • Unfortunately, this cannot be promised Can you think of an example ? • The solution: Additional run-time check 21 Problems: Covariant subtyping (1/2) • Covariance: change of type in signature is in the same direction as that of the inheritance A1.m() is “spoiled”: class X { } Requires only X1 class A { public void m(X x) { } } objects Class X1 extends X { } Class A1 extends A { public void m(X1 x) { } } ... A a = new A1(); Which method should be a.m(new X()); invoked: A.m() or A1.m() ? • Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding – • Same approach is taken by C++, C# But, covariance is allowed for arrays – Array assignments may fail at run-time 22 Problems: Covariant subtyping (2/2) (Now let us get back to our technioncatalog schema…) • A <course> value is also spoiled – • But, it also has an unspoiled super-class: XMLObject – • It requires unique children: <points>, <name>, etc. All updates to XMLObject are legal at compile-time The following code compiles successfully: public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); } Run-time error is here !! 23 Shaping the future (revisited) • Language constructs seen so far – – – Typed XML objects Seamless translation of a Schema/DTD into a Java type Two composition techniques • • – Two decomposition techniques • • – XML notation Java’s object creation syntax Typed XPath Typed, named methods/fields XPath expressions as first-class-values 24 XPath expression as first-class-values • What is a first-class-value? – • A value that can be used “naturally” in the program • Passed as an argument • Stored in a variable/field • Returned from a method • Created In XJ, XPath expression do not met these conditions – The main obstacle: The XPath part of the expression cannot be separated from its context provider 25 XPath expression as first-class-values (cont’d) • • Let’s speculate on XPath as an FCV… (Following code IS NOT a legal XJ program) private static Sequence<teacher> teachers; static Sequence<teacher> find(XPath<catalog,teacher> q) { Catalog c = new Catalog(new File("file1.xml")); return q.evaluate(c); } static void main(String[] args) { Sqeuence<teacher> all = find(<catalog>[| //teacher |]); Sequence<teacher> few = find( <catalog>[| //number/234319/../../teacher |] ); } 26 XPath expression as first-class-values (cont’d) • Operators on XPath values – – – Composition Conjunction Disjunction • These operators will allow the developer to easily create a rich array of safe XPath values • The compiler must keep track of the type of each such value – – Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject When two XPath values are composed, the result type is deduced from the types of the operands 27 Scala: Composition of XML elements • In Scala, types can be defined in a DTD file – • A DTD can be translated into Scala classes via the dtd2scala utility Scala offers two options for composition of XML elements: – – Using XML notation (similar to XJ) Using case-class construction notation: import Data._; // import generated definitions import scala.xml._; // for creating PCDATA nodes object Main with Application { val x = course(teacher(Text("Ran El-Yaniv")), points(Text("3")), name(Text("Combinatorics for CS")), number(Text("234141"))); Console.println(x); } 28 Typed, named methods/fields • Usually, values aggregated by a Java object are accessed by fields/methods – – Can we access XML sub-elements this way? (Following code IS NOT a legal XJ program) import technioncatalog.*; void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); } } 29 Typed, named methods/fields (cont’d) • Some of the difficulties: – – • Sub-elements are not always named Schema supports optional types: <xsd:choice> • How can Java express an “optional” field? Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types – – Missing features: virtual fields, inheritance without polymorphism Other features can be found in Functional languages • E.g.: Variant types, immutability, structural conformance • But, their popularity lags behind 30 Summary • XJ is a Java extension that has built in support for XML – Type safety: Many things are checked at compile time – Ease of use • OO languages are not powerful enough (in terms of typing) – Some type information is lost in the transition Schema -> Java 31 - The End- 32