Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
No More Pain for XML’s Gain
XJ: Facilitating XML Processing in Java
Matthew Harren
Mukund Raghavachari
Oded Shmueli
Michael Burke
Rajesh Bordawekar
Igor Pechtchanski
Vivek Sarke
Itay Maman
236826 Seminar lecture, 15 June 2005
The basic premise
• XML is getting increasingly popular
• XML manipulation is now a common programming
task
• The lead question:
– Do modern OO languages sufficiently support XML ?
2
Introduction: Schema file
(file: technioncatalog.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element name="course" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="points" type="xs:int"/>
<xs:element name="number" type="xs:int"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="teacher" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
3
Introduction: XML document
(file: short.xml)
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<course>
<points>3</points>
<number>234319</number>
<name>Programming Languages</name>
<teacher>Ron Pinter</teacher>
</course>
<course>
<points>3</points>
<number>234141</number>
<name>Combinatorics for CS</name>
<teacher>Ran El-Yaniv</teacher>
</course>
</catalog>
“Combinatorics for CS (234141) by Ran El-Yaniv,
Desired Output...
3 credit points”
4
Introduction: The XJ program
import java.io.*; import technioncatalog.*;
public class Demo1 {
public static void main(String[] args) throws Throwable
{
catalog cat = new catalog(new(File("short.xml"));
catalog.course c = cat [| /course[2] |];
printCourse(c);
}
private static void printCourse(catalog.course c) {
String name = c [| /name |];
String teacher = c [| /teacher |];
int points = c [| /points |];
int id = c [| /number |];
System.out.println(name + "(" + id + ") by "
+ teacher + ", " + points);
} }
5
Traditional XML processing:
(DOM, XPath apis)
The types of the XML
objects
(Node, Document) do
not reflect the schema
public static void main(String[] args) throws Throwable {
DocumentBuilderFactory dbf
= DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new java.io.File("short.xml"));
XPath xp = XPathFactory.newInstance().newXPath();
DTMNodeList nodes = (DTMNodeList)
xp.evaluate("//course", doc, XPathConstants.NODESET);
printCourse(nodes.item(1));
}
XPath is a plain string. It may be:
•Syntactically incorrect
•Incompatible with the document
6
Traditional XML processing
(DOM apis)
Assumption: 3rd child
is the course number
Assumption: 2nd child
has no child elements
private static void printCourse(Node n) {
NodeList nodes = n.getChildNodes();
System.out.println(nodes.item(5).getTextContent()
+ " (" + nodes.item(3).getTextContent()
+ ") by " + nodes.item(7).getTextContent()
+ ", " + nodes.item(1).getTextContent() + " credit points");
}
What about reading the
numeric value of an element?
Assumption: Four child
nodes must exist
• These assumptions will not hold if the schema is changed
– => run-time errors
– problems remain, even if we identify nodes by name
• Possible Schema changes:
– Allowing a new optional <students> sub-element
– Changing the order of the sub-elements
7
No easy solution
•
Similar problems occur when:
1. XML elements are created by the program
2. Other libraries are used for reading/writing XML
documents
– Such as: Xalan, SAX
3. The developer wraps several complex operations
within a single function/method/class
•
These are inherent problems of the language
8
Shaping the future
•
What XML-related facilities do we want?
–
–
–
Typed XML objects
Seamless translation of a Schema/DTD into a Java type
Two composition techniques
• XML notation
• Java’s object creation syntax
–
Two decomposition techniques
• Typed XPath
• Typed, named methods/fields
–
XPath expressions as first-class-values
9
Has the future arrived yet?
•
Significant effort in integration of XML into
modern programming language
–
–
–
–
–
•
XJ
Scala
Cω
XTatic
…
We will overview the constructs offered by XJ
– A super-set of Java
– Available at: http://www.research.ibm.com/xj
10
XJ’s Type system
11
XJ’s Type system
•
Hierarchy of classes
–
–
•
A common root class: XMLObject
Automatic import: package com.ibm.xj.*
Genericity: Sequence<T>, XMLCursor<T>
–
XMLCursor<T> is a Sequence<T> iterator
12
Integration with Schema
•
The rationale:
1. An OO program is a collection of class definitions
2. A Schema file is a collection of type definitions
•
=> let’s integrate these definitions
•
Any Schema is also an XJ types
– The XJ compiler generates a “logical class” for
each such type
– Schema file == package name
– Using a schema == import schema_file_name;
13
XML literal in XJ code
•
•
•
Invalid XML content triggers a compile-time error
Resulting elements are typed!
Curly braces allow “escaping” back into XJ
import technioncatalog.*;
public class Demo2 {
public static void main(String[] args) throws Throwable {
String x = "Algorithms 1"; int y = 234247;
catalog cat = buildCatalog(new catalog.course(
<course><points>3</points>
<number>{y}</number><name>{x}</name>
<teacher>Shlomo Moran</teacher></course>));
}
private static catalog buildCatalog(catalog.course c) {
return new catalog(<catalog>{c}</catalog>);
} }
14
An ill-typed program
...
Wrong <course> element
course c = new course(<course>
<teacher>Shlomo Moran</teacher></course>);
buildCatalog(c);
XMLObject x = new course.teacher (
<teacher>Shlomo Moran</teacher>);
buildCatalog(x);
An XMLObject cannot be passed as
...
a course element
private static catalog buildCatalog(catalog.course c) {
return new catalog(<catalog>{c}</catalog>);
}
15
Embedding XPath Queries in XJ
•
Syntax: XmlValue [| XPathQuery |]
course doSomething(catalog cat, int courseNum) {
return cat [| /course[./number = $courseNum] |];
}
•
Requires: a context-provider:
–
–
•
An XML element over which the XPath query is invoked
(see the cat variable in the sample)
Escaping: use a ‘$’ prefix
16
XPath Semantics
•
•
Problem: resulting type is sometimes not so clear
Two options
– Sequence<T>
•
If the compiler determines that all result elements are
of type T
– Sequence<XMLObject>
•
(Otherwise)
•
Automatic conversion from a singleton sequence
•
Static check of XPath queries
–
–
If result is always empty => compile-time error
(The compiler cannot catch all cases)
17
Implicit coercions
•
An atomic XML value can be seamlessly
converted into a corresponding Java value
–
–
–
–
•
xsd:double => double
xsd:boolean => boolean
xsd:string => java.lang.String
…
This reduces the verbosity of XML-related code:
import technioncatalog.*;
import technioncatalog.catalog.*;
public static String getTeacher(course c) {
return c [| /teacher |];
}
Sequence<teacher> ► teacher
► String
18
Updates: Assignment to Query Result
public static void changePoint(catalog.course c, int p) {
c [| /points |] = p;
}
•
•
•
An XPath expression returns a reference to an
existing element
– (No copying is involved)
– Consistent with Java’s semantics for objects
Thus, it can be assigned to
– An XPath expression is a legal lvalue
Bulk assignment
– Occurs when the XPath expression denotes a sequence
– Bulk assignment operator := allows multiple assignments
– Double the credit points of each course:
cat [| //points |]
*:=
2;
19
Tree structure update
•
Class XMLObject also defines methods, such as:
–
–
–
–
insertAfter()
insertBefore()
insertAsFirst()
detach()
public static void addCourse(catalog cat)
{
course c = new course(<course><points>4</points>
<number>234111</number><name>Intorduction to CS</name>
<teacher>Roy Friedman</teacher></course>);
cat.insertAsLast(c);
}
Which object is being modified?
20
Problems: Type Consistency
•
Definitions
1. An XML update operation, u, is a mapping over XML values
• u: T1 -> T2
2. An update is consistent if T1 = T2
•
Ideally, a compile-time error should be triggered for
each inconsistent update in the program
Why do we want the two types to be equal?
•
Unfortunately, this cannot be promised
Can you think of an example ?
•
The solution: Additional run-time check
21
Problems: Covariant subtyping (1/2)
•
Covariance: change of type in signature is in the
same direction as that of the inheritance
A1.m() is “spoiled”:
class X { }
Requires only X1
class A { public void m(X x) { } }
objects
Class X1 extends X { }
Class A1 extends A { public void m(X1 x) { } }
...
A a = new A1();
Which method should be
a.m(new X());
invoked: A.m() or A1.m() ?
•
Java favors type-safety: A method with covariant arguments is
considered to be an overloading rather than overriding
–
•
Same approach is taken by C++, C#
But, covariance is allowed for arrays
–
Array assignments may fail at run-time
22
Problems: Covariant subtyping (2/2)
(Now let us get back to our technioncatalog schema…)
•
A <course> value is also spoiled
–
•
But, it also has an unspoiled super-class: XMLObject
–
•
It requires unique children: <points>, <name>, etc.
All updates to XMLObject are legal at compile-time
The following code compiles successfully:
public static void trick(course c) {
XMLObject x = c;
points p = new points(<points>4</points>);
x.appendAsLast(p);
}
Run-time error is here !!
23
Shaping the future (revisited)
•
Language constructs seen so far
–
–
–
Typed XML objects
Seamless translation of a Schema/DTD into a Java type
Two composition techniques
•
•
–
Two decomposition techniques
•
•
–
XML notation
Java’s object creation syntax
Typed XPath
Typed, named methods/fields
XPath expressions as first-class-values
24
XPath expression as first-class-values
•
What is a first-class-value?
–
•
A value that can be used “naturally” in the program
• Passed as an argument
• Stored in a variable/field
• Returned from a method
• Created
In XJ, XPath expression do not met these conditions
–
The main obstacle: The XPath part of the expression cannot
be separated from its context provider
25
XPath expression as first-class-values
(cont’d)
•
•
Let’s speculate on XPath as an FCV…
(Following code IS NOT a legal XJ program)
private static Sequence<teacher> teachers;
static Sequence<teacher> find(XPath<catalog,teacher> q) {
Catalog c = new Catalog(new File("file1.xml"));
return q.evaluate(c);
}
static void main(String[] args) {
Sqeuence<teacher> all = find(<catalog>[| //teacher |]);
Sequence<teacher> few = find(
<catalog>[| //number/234319/../../teacher |] );
}
26
XPath expression as first-class-values
(cont’d)
•
Operators on XPath values
–
–
–
Composition
Conjunction
Disjunction
•
These operators will allow the developer to easily create
a rich array of safe XPath values
•
The compiler must keep track of the type of each such
value
–
–
Basically an XPath value is a function T -> R, where both
T,R are subclasses of XMLObject
When two XPath values are composed, the result type is
deduced from the types of the operands
27
Scala: Composition of XML elements
•
In Scala, types can be defined in a DTD file
–
•
A DTD can be translated into Scala classes via the
dtd2scala utility
Scala offers two options for composition of XML
elements:
–
–
Using XML notation (similar to XJ)
Using case-class construction notation:
import Data._; // import generated definitions
import scala.xml._; // for creating PCDATA nodes
object Main with Application {
val x = course(teacher(Text("Ran El-Yaniv")),
points(Text("3")), name(Text("Combinatorics for CS")),
number(Text("234141")));
Console.println(x); }
28
Typed, named methods/fields
•
Usually, values aggregated by a Java object
are accessed by fields/methods
–
–
Can we access XML sub-elements this way?
(Following code IS NOT a legal XJ program)
import technioncatalog.*;
void printTeachers(catalog cat) {
for(int i = 0; i < cat.courses.length; ++i)
{
catalog.course c = cat.courses[i];
System.out.println(c.teacher);
}
}
29
Typed, named methods/fields
(cont’d)
•
Some of the difficulties:
–
–
•
Sub-elements are not always named
Schema supports optional types: <xsd:choice>
• How can Java express an “optional” field?
Observation: Java’s typing mechanisms
cannot capture the wealth of Schema/DTD types
–
–
Missing features: virtual fields, inheritance without
polymorphism
Other features can be found in Functional languages
• E.g.: Variant types, immutability, structural conformance
• But, their popularity lags behind
30
Summary
•
XJ is a Java extension that has built in
support for XML
– Type safety: Many things are checked at
compile time
– Ease of use
•
OO languages are not powerful enough
(in terms of typing)
– Some type information is lost in the
transition Schema -> Java
31
- The End-
32